Odd PPC inline asm constraint

Hello,

I am not sure whether this is a clang issue, an LLVM issue, or both;
but clang chokes when parsing expanded macros from the
glibc /usr/include/bits/fenvinline.h with an error like:

./boost/math/tools/config.hpp:279:10: error: invalid input constraint
'i#*X' in asm feclearexcept(FE_ALL_EXCEPT);
         ^
/usr/include/bits/fenvinline.h:56:11: note: expanded from macro
'feclearexcept' : : "i#*X"(__builtin_ffs (__excepts))); \
                               ^
1 error generated.

There is a comment in the file which reads:

/* The weird 'i#*X' constraints on the following suppress a gcc
   warning when __excepts is not a constant. Otherwise, they mean the
   same as just plain 'i'. */

For example, one of the full macros is:

/* Inline definition for feclearexcept. */
# define feclearexcept(__excepts) \
  ((__builtin_constant_p
(__excepts) \ && ((__excepts)
& ((__excepts)-1)) == 0 \ &&
(__excepts) != FE_INVALID)
\ ? ((__excepts) !=
0 \ ?
(__extension__ ({ __asm__ __volatile__ \
("mtfsb0 %s0" \ : :
"i#*X"(__builtin_ffs (__excepts))); \
0; })) \ :
0)
\ : (feclearexcept) (__excepts))

Does anyone know what that "weird" asm constraint actually means? (and
where I should add support for it?)

Thanks again,
Hal

There is a comment in the file which reads:

/* The weird 'i#*X' constraints on the following suppress a gcc
   warning when __excepts is not a constant. Otherwise, they mean the
   same as just plain 'i'. */

[sinp]

("mtfsb0 %s0" : : "i#*X"(__builtin_ffs (__excepts)));

[snip]

Does anyone know what that "weird" asm constraint actually means?

The "i" and "X" constraints are documented here:

  Simple Constraints - Using the GNU Compiler Collection (GCC)

`i'
    An immediate integer operand (one with constant value) is allowed.
    This includes symbolic constants whose values will be known only
    at assembly time or later.

`X'
    Any operand whatsoever is allowed.

The # and * constraint modifiers are documented here:

  Modifiers - Using the GNU Compiler Collection (GCC)

`#'
    Says that all following characters, up to the next comma, are to
    be ignored as a constraint. They are significant only for choosing
    register preferences.

`*'
    Says that the following character should be ignored when choosing
    register preferences. `*' has no effect on the meaning of the constraint
    as a constraint, and no effect on reloading.

For more info about PowerPC specific constraints, you'll want to look here:

  Machine Constraints - Using the GNU Compiler Collection (GCC)

I'll note that the "s" in the %s0 means to only print the low 5 bits of
operand 0. I think that may only be documented in the src:

  gcc/config/rs6000/rs6000.c:print_operand()

Peter

Peter,

Thanks! Do you happen to know where this needs to be changed in clang
or LLVM. The code that actually interprets the constraints,
generically, is in CodeGen/SelectionDAG/TargetLowering.cpp, is clang
relying on that code, or is there some frontend code in clang itself
that is failing to initially interpret the string? If it is the code in
TargetLowering, then I don't see any support there for '*' or '#'.

-Hal

Heh, I'm afraid I have no clue as to where clang needs to be changed.
I'm the team lead for IBM's Linux on POWER GCC development team, so
I can help you with questions about PPC hardware, PPC ABIs and why
GCC does things the way it does on PPC, but I'll not be of much
help with LLVM itself. I'm just a lurker here. :slight_smile:

That said, I'm curious about the extent of LLVM's support for PPC.
How robust is it? Does it support generating both 32-bit and 64-bit
binaries?

I'll note that although I work on GCC, I have no problems seeing LLVM
supporting PPC. The more the merrier.

Peter

[snip]

That said, I'm curious about the extent of LLVM's support for PPC.
How robust is it? Does it support generating both 32-bit and 64-bit
binaries?

I use clang/LLVM to generate code for both ppc and ppc64. The support seems
pretty robust so far: http://ellcc.org

-Rich

> Thanks! Do you happen to know where this needs to be changed in
> clang or LLVM. The code that actually interprets the constraints,
> generically, is in CodeGen/SelectionDAG/TargetLowering.cpp, is clang
> relying on that code, or is there some frontend code in clang itself
> that is failing to initially interpret the string? If it is the
> code in TargetLowering, then I don't see any support there for '*'
> or '#'.

Heh, I'm afraid I have no clue as to where clang needs to be changed.
I'm the team lead for IBM's Linux on POWER GCC development team, so
I can help you with questions about PPC hardware, PPC ABIs and why
GCC does things the way it does on PPC, but I'll not be of much
help with LLVM itself. I'm just a lurker here. :slight_smile:

That's great, thanks!

That said, I'm curious about the extent of LLVM's support for PPC.
How robust is it? Does it support generating both 32-bit and 64-bit
binaries?

LLVM supports generating both 32 bit and 64 binaries. I have used
LLVM/clang to compile large and important codes on our Blue Gene
supercomputers (and their POWER frontend nodes), including some that
use the Boost C++ libraries; these codes run well and the performance
is often quite reasonable. I've recently added processor itineraries for
both the 440/450 and A2 embedded cores, and the code generation for
these cores is now really quite good. There are some deficiencies, here
are some that come to mind:

- Support for the 128-bit double-double format used for long doubles
   on Linux (and AIX) is currently broken [I am actively working on
   fixing this].
- There is no support for generating position-independent code on
   PPC32. (PIC on PPC64 now works well). Nevertheless, I have sometimes
   run into linking errors when compiling shared libraries with C++ on
   PPC64.
- There is no support for TLS.
- Support for inline asm needs improvement (it often works, but
   sometimes I've run across unsupported constructs [as in this
   thread]).
- The lowering code that generates the update forms of the load and
   store instructions is currently is buggy (and is disabled by default)
   [small test cases work, but enabling this on the test suite induces
   runtime failures]. This is currently my top priority for performance
   fixes (I am not sure how important it is on POWER, but on the
   embedded cores in makes a big difference)
- There is currently no support for generating loops using
   control-registers for branch and increment (I am not sure if this
   matters on POWER, but it does make some difference for small
   trip-count loops on the embedded cores).
- Register reservations can use some improvement. We currently need to
   reserve an additional register to handle the corner case where a
   condition register need to be spilled into a large stack frame (one
   register to compute the address, and a second one into which to
   transfer the condition register's contents). I'd like to improve
   this at some point.

So if you stick to static linking and don't use TLS or long doubles,
then it actually works quite well. Dynamic linking on PPC64 works most
of the time. I've tried to keep the PPC 970 hazard detector in working
order, but I've never really done much of a performance study on the
non-embedded cores. Assistance with any of this would, of course, be
greatly appreciated.

I'll note that although I work on GCC, I have no problems seeing LLVM
supporting PPC. The more the merrier.

Good! :slight_smile:

-Hal

> > Thanks! Do you happen to know where this needs to be changed in
> > clang or LLVM. The code that actually interprets the constraints,
> > generically, is in CodeGen/SelectionDAG/TargetLowering.cpp, is
> > clang relying on that code, or is there some frontend code in
> > clang itself that is failing to initially interpret the string?
> > If it is the code in TargetLowering, then I don't see any support
> > there for '*' or '#'.
>
> Heh, I'm afraid I have no clue as to where clang needs to be
> changed. I'm the team lead for IBM's Linux on POWER GCC development
> team, so I can help you with questions about PPC hardware, PPC ABIs
> and why GCC does things the way it does on PPC, but I'll not be of
> much help with LLVM itself. I'm just a lurker here. :slight_smile:

That's great, thanks!

>
> That said, I'm curious about the extent of LLVM's support for PPC.
> How robust is it? Does it support generating both 32-bit and 64-bit
> binaries?

LLVM supports generating both 32 bit and 64 binaries. I have used
LLVM/clang to compile large and important codes on our Blue Gene
supercomputers (and their POWER frontend nodes), including some that
use the Boost C++ libraries; these codes run well and the performance
is often quite reasonable. I've recently added processor itineraries
for both the 440/450 and A2 embedded cores, and the code generation
for these cores is now really quite good. There are some
deficiencies, here are some that come to mind:

- Support for the 128-bit double-double format used for long doubles
   on Linux (and AIX) is currently broken [I am actively working on
   fixing this].
- There is no support for generating position-independent code on
   PPC32. (PIC on PPC64 now works well). Nevertheless, I have
sometimes run into linking errors when compiling shared libraries
with C++ on PPC64.
- There is no support for TLS.
- Support for inline asm needs improvement (it often works, but
   sometimes I've run across unsupported constructs [as in this
   thread]).
- The lowering code that generates the update forms of the load and
   store instructions is currently is buggy (and is disabled by
default) [small test cases work, but enabling this on the test suite
induces runtime failures]. This is currently my top priority for
performance fixes (I am not sure how important it is on POWER, but on
the embedded cores in makes a big difference)
- There is currently no support for generating loops using
   control-registers for branch and increment (I am not sure if this
   matters on POWER, but it does make some difference for small
   trip-count loops on the embedded cores).
- Register reservations can use some improvement. We currently need
to reserve an additional register to handle the corner case where a
   condition register need to be spilled into a large stack frame (one
   register to compute the address, and a second one into which to
   transfer the condition register's contents). I'd like to improve
   this at some point.

I forgot to add:
  - Altivec support currently seems broken (there are some tests with
    altivec intrinsics in the test suite, these all fail to compile)
  - There is no VSX support.

-Hal

>
> > > Thanks! Do you happen to know where this needs to be changed in
> > > clang or LLVM. The code that actually interprets the
> > > constraints, generically, is in
> > > CodeGen/SelectionDAG/TargetLowering.cpp, is clang relying on
> > > that code, or is there some frontend code in clang itself that
> > > is failing to initially interpret the string? If it is the code
> > > in TargetLowering, then I don't see any support there for '*'
> > > or '#'.
> >
> > Heh, I'm afraid I have no clue as to where clang needs to be
> > changed. I'm the team lead for IBM's Linux on POWER GCC
> > development team, so I can help you with questions about PPC
> > hardware, PPC ABIs and why GCC does things the way it does on
> > PPC, but I'll not be of much help with LLVM itself. I'm just a
> > lurker here. :slight_smile:
>
> That's great, thanks!
>
> >
> > That said, I'm curious about the extent of LLVM's support for PPC.
> > How robust is it? Does it support generating both 32-bit and
> > 64-bit binaries?
>
> LLVM supports generating both 32 bit and 64 binaries. I have used
> LLVM/clang to compile large and important codes on our Blue Gene
> supercomputers (and their POWER frontend nodes), including some that
> use the Boost C++ libraries; these codes run well and the
> performance is often quite reasonable. I've recently added
> processor itineraries for both the 440/450 and A2 embedded cores,
> and the code generation for these cores is now really quite good.
> There are some deficiencies, here are some that come to mind:
>
> - Support for the 128-bit double-double format used for long
> doubles on Linux (and AIX) is currently broken [I am actively
> working on fixing this].
> - There is no support for generating position-independent code on
> PPC32. (PIC on PPC64 now works well). Nevertheless, I have
> sometimes run into linking errors when compiling shared libraries
> with C++ on PPC64.
> - There is no support for TLS.
> - Support for inline asm needs improvement (it often works, but
> sometimes I've run across unsupported constructs [as in this
> thread]).
> - The lowering code that generates the update forms of the load and
> store instructions is currently is buggy (and is disabled by
> default) [small test cases work, but enabling this on the test suite
> induces runtime failures]. This is currently my top priority for
> performance fixes (I am not sure how important it is on POWER, but
> on the embedded cores in makes a big difference)
> - There is currently no support for generating loops using
> control-registers for branch and increment (I am not sure if this
> matters on POWER, but it does make some difference for small
> trip-count loops on the embedded cores).
> - Register reservations can use some improvement. We currently need
> to reserve an additional register to handle the corner case where a
> condition register need to be spilled into a large stack frame
> (one register to compute the address, and a second one into which to
> transfer the condition register's contents). I'd like to improve
> this at some point.

I forgot to add:
  - Altivec support currently seems broken (there are some tests with
    altivec intrinsics in the test suite, these all fail to compile)
  - There is no VSX support.

Roman pointed out to me that I misspoke. LLVM only generates PIC on
Darwin, not for ELF. What does work on PPC64 is dynamic linking
(meaning that it will correctly put nop after the calls so that the
linker can do its thing). To support dynamic linking on PPC32 we'd need
to explicitly add other things (stubs?) and that is not implemented.

-Hal

Just two random points, we can compile and boot FreeBSD kernel and I was
able to selfhost statically compiled clang/llvm.

Beside missing PIC support and not saving FP args when CR bit 6 is not set
there's nothing (famous last words) preventing FreeBSD from using clang
as its default compiler on PPC.

roman

> > - There is no support for generating position-independent code on
> > PPC32. (PIC on PPC64 now works well). Nevertheless, I have
> > sometimes run into linking errors when compiling shared libraries
> > with C++ on PPC64.

PPC64 is PIC by nature. As for the linking issue, possibly you blew
the TOC with too many entries? It used to be even with GCC, we could
not compile doxygen (with or without -mminimal-toc) without filling
up the TOC and hitting the TOC overflow linker error. T fix those types
of problems, we recently added two more code models to GCC/binutils, so
we're no longer limited to 16-bit TOC offsets. We now how -mcmodel=medium
(32-bit TOC offsets) and -mcmodel=large (64-bit TOC offsets), with
-mcmodel=medium being the new GCC default (on PPC64). The old TOC code
is now called -mcmodel=small.

> > - There is currently no support for generating loops using
> > control-registers for branch and increment (I am not sure if this
> > matters on POWER, but it does make some difference for small
> > trip-count loops on the embedded cores).

It helps on our server class hardware too, so we do make use of it.

> > - Register reservations can use some improvement. We currently need
> > to reserve an additional register to handle the corner case where a
> > condition register need to be spilled into a large stack frame
> > (one register to compute the address, and a second one into which to
> > transfer the condition register's contents). I'd like to improve
> > this at some point.

Reserve as in you don't allow anything to be allocated to it just in
the uncommon case you have to spill a condition reg to a stack slot you
cannot write to with a 16-bit offset? Speaking as a person who has
implemented register allocators, that is bad!

Roman pointed out to me that I misspoke. LLVM only generates PIC on
Darwin, not for ELF. What does work on PPC64 is dynamic linking
(meaning that it will correctly put nop after the calls so that the
linker can do its thing). To support dynamic linking on PPC32 we'd need
to explicitly add other things (stubs?) and that is not implemented.

If by stubs you mean PLT call stubs, those are created by the linker
for both PPC and PPC64 binaries.

I'm not sure what distro you're running on, but you may be hitting
the new 32-bit secure-plt implementation all new distros are using.
The old 32-bit PLT code used to generatie a branch/return to the GOT
and the updated LR value was used to gain addressability to the GOT.
The problem is that the GOT is in the date section, so for that to
work, the data section of your program had to be marked executable.
With -msecure-plt (the new default for all new distros), that is
no longer the case. Maybe the non secure-plt code isn't playing
well with the system crt*.o files and libs?

Are there build directions for building LLVM for ppc/ppc64?
I thought I had read that clang didn't work for ppc/ppc64 and that
you had to use llvm-gcc thingy. Is that not the case anymore?

Peter

> > > - There is no support for generating position-independent code
> > > on PPC32. (PIC on PPC64 now works well). Nevertheless, I have
> > > sometimes run into linking errors when compiling shared
> > > libraries with C++ on PPC64.

PPC64 is PIC by nature. As for the linking issue, possibly you blew
the TOC with too many entries? It used to be even with GCC, we could
not compile doxygen (with or without -mminimal-toc) without filling
up the TOC and hitting the TOC overflow linker error. T fix those
types of problems, we recently added two more code models to
GCC/binutils, so we're no longer limited to 16-bit TOC offsets. We
now how -mcmodel=medium (32-bit TOC offsets) and -mcmodel=large
(64-bit TOC offsets), with -mcmodel=medium being the new GCC default
(on PPC64). The old TOC code is now called -mcmodel=small.

This is good to know, we should definitely make sure this is supported
in the clang driver. I believe that I've generally been able to compile
shared libraries on PPC64, but, when compiling Boost for example, I've
seen linking errors due to multiply defined constructor and destructor
symbols (I've not yet had a chance to look into this).

> > > - There is currently no support for generating loops using
> > > control-registers for branch and increment (I am not sure if
> > > this matters on POWER, but it does make some difference for
> > > small trip-count loops on the embedded cores).

It helps on our server class hardware too, so we do make use of it.

> > > - Register reservations can use some improvement. We currently
> > > need to reserve an additional register to handle the corner
> > > case where a condition register need to be spilled into a large
> > > stack frame (one register to compute the address, and a second
> > > one into which to transfer the condition register's contents).
> > > I'd like to improve this at some point.

Reserve as in you don't allow anything to be allocated to it just in
the uncommon case you have to spill a condition reg to a stack slot
you cannot write to with a 16-bit offset? Speaking as a person who
has implemented register allocators, that is bad!

Yes, this is exactly what now happens, and it needs to be fixed (this
is also my fault, I introduced this behavior to fix a bug [the
register scavenger used by the spilling code only has one emergency
spill slot, and in the case you mentioned, we need two registers]).

> Roman pointed out to me that I misspoke. LLVM only generates PIC on
> Darwin, not for ELF. What does work on PPC64 is dynamic linking
> (meaning that it will correctly put nop after the calls so that the
> linker can do its thing). To support dynamic linking on PPC32 we'd
> need to explicitly add other things (stubs?) and that is not
> implemented.

If by stubs you mean PLT call stubs, those are created by the linker
for both PPC and PPC64 binaries.

Yes, exactly. I knew that the linker created these on PPC64, but I
thought some compiler involvement was necessary for PPC32. If that is
not true, then our job just got easier :wink:

Unfortunately, I know very little about this; the extent of my
experience is this: when I started working with the PPC backend, on
PPC64, the NOPs were not always placed after the calls correctly (which
predictably caused linking errors when using dynamic linking); I fixed
this and now I can dynamically link executables on PPC64.

If you could look at the asm produced and help us to figure out what,
if anything, is wrong with it, that would be greatly appreciated.

I'm not sure what distro you're running on, but you may be hitting
the new 32-bit secure-plt implementation all new distros are using.
The old 32-bit PLT code used to generatie a branch/return to the GOT
and the updated LR value was used to gain addressability to the GOT.
The problem is that the GOT is in the date section, so for that to
work, the data section of your program had to be marked executable.
With -msecure-plt (the new default for all new distros), that is
no longer the case. Maybe the non secure-plt code isn't playing
well with the system crt*.o files and libs?

Are there build directions for building LLVM for ppc/ppc64?
I thought I had read that clang didn't work for ppc/ppc64 and that
you had to use llvm-gcc thingy. Is that not the case anymore?

LLVM/clang now will build in the normal way (./configure; make install)
on PPC (you'll need at least the 3.1 release candidate (or trunk)). I
generally build on my PPC64 hosts with:
make ENABLE_OPTIMIZED=1 OPTIMIZE_OPTION=-O2 EXTRA_OPTIONS=-mminimal-toc

Thanks again,
Hal

Ok, it built fine, but what is the llvm equivalent of gcc's -m32 -m64?
Google doesn't seem to be much help, nor is the clang --help output.

Peter

> LLVM/clang now will build in the normal way (./configure; make
> install) on PPC (you'll need at least the 3.1 release candidate (or
> trunk)). I generally build on my PPC64 hosts with:
> make ENABLE_OPTIMIZED=1 OPTIMIZE_OPTION=-O2
> EXTRA_OPTIONS=-mminimal-toc

Ok, it built fine, but what is the llvm equivalent of gcc's -m32 -m64?
Google doesn't seem to be much help, nor is the clang --help output.

Noted, we should fix that as well. By default it should build for
whatever the current host is (no special flags required). To
specifically build for something else, use:
-ccc-host-triple powerpc64-unknown-linux-gnu
or
-ccc-host-triple powerpc-unknown-linux-gnu

The other secret is to try: --help-hidden (for all of the driver
options) and also: -cc1 -help (this shows all of the compiler options;
many of these [especially the ones that look like gcc options] are
passed through to the compiler [for the others, you'd need to use
-Xclang <option>].

-Hal

So LLVM isn't biarch capable? Meaning one LLVM compiler cannot
generate both 32-bit and 64-bit binaries?

Peter

> By default it should build for
> whatever the current host is (no special flags required). To
> specifically build for something else, use:
> -ccc-host-triple powerpc64-unknown-linux-gnu
> or
> -ccc-host-triple powerpc-unknown-linux-gnu

So LLVM isn't biarch capable? Meaning one LLVM compiler cannot
generate both 32-bit and 64-bit binaries?

It can, you can provide those on the clang command line.

-Hal

> By default it should build for
> whatever the current host is (no special flags required). To
> specifically build for something else, use:
> -ccc-host-triple powerpc64-unknown-linux-gnu
> or
> -ccc-host-triple powerpc-unknown-linux-gnu

So LLVM isn't biarch capable? Meaning one LLVM compiler cannot
generate both 32-bit and 64-bit binaries?

Sorry for replying to my own message, but...

Oh, -ccc-host-triple is a compiler option and not a configure option.
That does work, though it seems I have to link with gcc, since llvm
still wants to link against the 64-bit crt*.o and libs. Maybe it is
easier to just have two separate builds.

That said, my simple dynamically linked hello world executed fine
(ie, it was able to call into libc.so just fine), as well as an
old C version of the SPEC97 tomcatv benchmark I have laying around.
So it seems both 32-bit and 64-bit can call into shared libs.

Not to say I haven't seen some code gen warts (using -O3). :slight_smile:

From hello.s:

    main:
        mflr 0
        stw 31, -4(1)
        stw 0, 4(1)
        stwu 1, -16(1)
        lis 3, .Lstr@ha
        mr 31, 1
        la 3, .Lstr@l(3)
        bl puts
        li 3, 0
        addi 1, 1, 16
        lwz 0, 4(1)
        lwz 31, -4(1)
        mtlr 0
        blr

By the strict letter of the 32-bit ABI, the save and restore of
r31 at a negative offset of r1 is verboten. The ABI states the
the stack space below the stack pointer is declared as volatile.
I actually debugged a similar problem way back in my Blue Gene/L
days, where gcc had a bug and was doing the same thing. We ended
up taking a signal between the restore of the stack pointer and
the restore of the nonvolatile reg and the BGL compute node kernel
trashed the stack below the stack pointer.

The second wart is the dead copy to r31...which leads to the
unnecessary save and restore of r31.

For tomcatv, we have to basically save/restore the entire set
of non-volatile integer and fp registers. Looking at how
llvm does that shows:

        ...
        lis 3, 56
        ori 3, 3, 57680
        stwx 16, 31, 3
        lis 3, 56
        ori 3, 3, 57684
        stwx 17, 31, 3
        lis 3, 56
        ori 3, 3, 57688
        stwx 18, 31, 3
        lis 3, 56
        ori 3, 3, 57692
        stwx 19, 31, 3
        lis 3, 56
        ori 3, 3, 57696
        stwx 20, 31, 3
        lis 3, 56
        ori 3, 3, 57700
        stwx 21, 31, 3
        [repeated over and over and ...]

Kind of ugly! :slight_smile: GCC on the other hand stashes away the old value of
the stack pointer and then uses small negative offsets (legal at this
point since we've already decremented the stack pointer) from that for
all of its saves/restores:

        ...
        lis 0,0xffc7
        mr 12,1
        ori 0,0,7728
        stwux 1,1,0
        mflr 0
        stw 0,4(12)
        stfd 14,-144(12)
        stfd 15,-136(12)
        stfd 16,-128(12)
        stfd 17,-120(12)
        stfd 18,-112(12)
        ...
For things that don't work, do you have a small example program
that shows what's wrong?

Peter

> > By default it should build for
> > whatever the current host is (no special flags required). To
> > specifically build for something else, use:
> > -ccc-host-triple powerpc64-unknown-linux-gnu
> > or
> > -ccc-host-triple powerpc-unknown-linux-gnu
>
> So LLVM isn't biarch capable? Meaning one LLVM compiler cannot
> generate both 32-bit and 64-bit binaries?

Sorry for replying to my own message, but...

Oh, -ccc-host-triple is a compiler option and not a configure option.
That does work, though it seems I have to link with gcc, since llvm
still wants to link against the 64-bit crt*.o and libs. Maybe it is
easier to just have two separate builds.

FWIW, you can also use the -gcc-toolchain and -ccc-gcc-name parameters
to switch what gcc install is used for linking [although it should
find the correct libs by itself, assuming things are in
vaguely-default install paths, but perhaps that is not working for
you?].

That said, my simple dynamically linked hello world executed fine
(ie, it was able to call into libc.so just fine), as well as an
old C version of the SPEC97 tomcatv benchmark I have laying around.
So it seems both 32-bit and 64-bit can call into shared libs.

Not to say I haven't seen some code gen warts (using -O3). :slight_smile:

From hello.s:

    main:
        mflr 0
        stw 31, -4(1)
        stw 0, 4(1)
        stwu 1, -16(1)
        lis 3, .Lstr@ha
        mr 31, 1
        la 3, .Lstr@l(3)
        bl puts
        li 3, 0
        addi 1, 1, 16
        lwz 0, 4(1)
        lwz 31, -4(1)
        mtlr 0
        blr

By the strict letter of the 32-bit ABI, the save and restore of
r31 at a negative offset of r1 is verboten. The ABI states the
the stack space below the stack pointer is declared as volatile.
I actually debugged a similar problem way back in my Blue Gene/L
days, where gcc had a bug and was doing the same thing. We ended
up taking a signal between the restore of the stack pointer and
the restore of the nonvolatile reg and the BGL compute node kernel
trashed the stack below the stack pointer.

Interesting, we should definitely fix this.

I've been trying to get things in working order here so that we can use
clang/llvm on our BG/P and Q [as soon as I finish writing
regression tests, I have support for Double Hummer and QPX ready, and
I'll contribute that as well].

The second wart is the dead copy to r31...which leads to the
unnecessary save and restore of r31.

And we should clean this up too :wink:

For tomcatv, we have to basically save/restore the entire set
of non-volatile integer and fp registers. Looking at how
llvm does that shows:

        ...
        lis 3, 56
        ori 3, 3, 57680
        stwx 16, 31, 3
        lis 3, 56
        ori 3, 3, 57684
        stwx 17, 31, 3
        lis 3, 56
        ori 3, 3, 57688
        stwx 18, 31, 3
        lis 3, 56
        ori 3, 3, 57692
        stwx 19, 31, 3
        lis 3, 56
        ori 3, 3, 57696
        stwx 20, 31, 3
        lis 3, 56
        ori 3, 3, 57700
        stwx 21, 31, 3
        [repeated over and over and ...]

Kind of ugly! :slight_smile: GCC on the other hand stashes away the old value of
the stack pointer and then uses small negative offsets (legal at this
point since we've already decremented the stack pointer) from that for
all of its saves/restores:

        ...
        lis 0,0xffc7
        mr 12,1
        ori 0,0,7728
        stwux 1,1,0
        mflr 0
        stw 0,4(12)
        stfd 14,-144(12)
        stfd 15,-136(12)
        stfd 16,-128(12)
        stfd 17,-120(12)
        stfd 18,-112(12)
        ...
For things that don't work, do you have a small example program
that shows what's wrong?

Roman, can you comment?

Thanks again,
Hal

fwiw, on FreeBSD -m32 works just fine with clang. Take a look at
r132634/r132635. You can get some inspiration for the linux toolchain.

roman

Peter,

Could you please comment on:
http://llvm.org/bugs/show_bug.cgi?id=12757

Specifically, gcc seems to allow this:
int __flt_rounds() {
unsigned long fpscr;
__asm__ volatile("mffs %0" : "=f"(fpscr));
return fpscr;
}

My reading of this is that gcc allocates a floating-point register to
hold the result of the mffs instruction, and then bit casts (and
truncates?) the result into the unsigned long variable. Is this
correct, and if so, is this a general gcc feature, or something PowerPC
specific?

Thanks again,
Hal

> > By default it should build for
> > whatever the current host is (no special flags required). To
> > specifically build for something else, use:
> > -ccc-host-triple powerpc64-unknown-linux-gnu
> > or
> > -ccc-host-triple powerpc-unknown-linux-gnu
>
> So LLVM isn't biarch capable? Meaning one LLVM compiler cannot
> generate both 32-bit and 64-bit binaries?

Sorry for replying to my own message, but...

Oh, -ccc-host-triple is a compiler option and not a configure option.
That does work, though it seems I have to link with gcc, since llvm
still wants to link against the 64-bit crt*.o and libs. Maybe it is
easier to just have two separate builds.

That said, my simple dynamically linked hello world executed fine
(ie, it was able to call into libc.so just fine), as well as an
old C version of the SPEC97 tomcatv benchmark I have laying around.
So it seems both 32-bit and 64-bit can call into shared libs.

Not to say I haven't seen some code gen warts (using -O3). :slight_smile:

From hello.s:

    main:
        mflr 0
        stw 31, -4(1)
        stw 0, 4(1)
        stwu 1, -16(1)
        lis 3, .Lstr@ha
        mr 31, 1
        la 3, .Lstr@l(3)
        bl puts
        li 3, 0
        addi 1, 1, 16
        lwz 0, 4(1)
        lwz 31, -4(1)
        mtlr 0
        blr

By the strict letter of the 32-bit ABI, the save and restore of
r31 at a negative offset of r1 is verboten. The ABI states the
the stack space below the stack pointer is declared as volatile.
I actually debugged a similar problem way back in my Blue Gene/L
days, where gcc had a bug and was doing the same thing. We ended
up taking a signal between the restore of the stack pointer and
the restore of the nonvolatile reg and the BGL compute node kernel
trashed the stack below the stack pointer.

Just to confirm, this is an issue specific to the 32-bit ABI, correct?
gcc (4.4.6) seems to do the same thing for PPC64.

Thanks again,
Hal