Testing LLVM on OS X

I'm interested in getting LLVM running on OS X so I can play around with it and check it out. I downloaded the LLVM 1.2 package and compiled and installed with no errors (used config options --with-llvmgccidr and --enable-spec2000 pointing to the relevant directories). I want to look at performance of SPEC CPU2000 with LLVM vs gcc.

I was able to successfully compile and run the hello world program using LLVM. I then made a simple spec config file to try to compile spec with LLVM:

ext=ppc32_llvm
teeout=yes
teerunout=yes;
default=default=default=default:

I'm interested in getting LLVM running on OS X so I can play around
with it and check it out. I downloaded the LLVM 1.2 package and
compiled and installed with no errors (used config options
--with-llvmgccidr and --enable-spec2000 pointing to the relevant
directories). I want to look at performance of SPEC CPU2000 with LLVM
vs gcc.

Great!

CC=/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc
CXX=/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/llvmg++
FC=g77
OPTIMIZE = -O3 -fomit-frame-pointer

To test that everything was setup correctly, I tried doing just the
164.gzip test first. It seemed to build the gzip test without errors,
but when it tried to run it, it said that it couldn't find the bytecode
file for that program. I looked at the files generated from the build,
and there's a script that calls lli and passes it a bc file, but sure
enough the .bc file is not there. I realize this is sort of a spec
question rather than a LLVM question, but where would it put that .bc
file and where do I need to copy that file to in order for it to work?

The problem is that LLVM only compiles to .bc files by default. This
means that when you say:
  llvmgcc hello.c -o hello

You get a hello.bc file and hello "executable", which is really just a
shell script that runs the bytecode file with 'lli'.

There are two problems with this: 1) there is no JIT for PPC yet, so
LLVM will use the interpreter (which is intolerably slow and has other
issues). 2) Spec compiles the executables in one place and them moves
them to another, but it only copies the shell script and not the
bytecode file, so you get that error message.

The normal solution to this problem is to link with the llvmgcc
'-Wl,-native' option. This tells llvmgcc to produce a native executable
instead of a shell script, using a PPC backend. I'm sure you immediately
see the problem with this. :slight_smile:

The real solution to your problem is in CVS. There llvmgcc supports a
-Wl,-native-cbe option which will use the C backend to create a native
executable for you, and can actually be used to run SPEC successfully.
If you're willing to grab LLVM CVS (details here:
http://llvm.cs.uiuc.edu/docs/GettingStarted.html#checkout ), this is the
best bet.

Second, according to the documentation, the OS X version of LLVM
doesn't generate native code and is run through the interpreter. I
assume this would make it a lot slower than if it did generate native
code.

Yes, by a factor of 1000 or so. :slight_smile:

Is anyone currently working on making it generate native ppc code for OS
X? If not, what would be involved in doing so? I'm new to LLVM, I'd
appreciate some pointers to some info that might shed some light on
that.

There was a group that started working on this, but I believe they got
stalled and are no longer working on it. It will try to get in touch with
them to see where things stand, then get back to you on this.

Looks like a really interesting project! Thanks in advance!

Thanks! Sorry for the delayed response, I experienced an email avalanche
and your mail got buried. :slight_smile:

-Chris

There are two problems with this: 1) there is no JIT for PPC yet, so
LLVM will use the interpreter (which is intolerably slow and has other
issues). 2) Spec compiles the executables in one place and them moves
them to another, but it only copies the shell script and not the
bytecode file, so you get that error message.

The normal solution to this problem is to link with the llvmgcc
'-Wl,-native' option. This tells llvmgcc to produce a native executable
instead of a shell script, using a PPC backend. I'm sure you immediately
see the problem with this. :slight_smile:

The real solution to your problem is in CVS. There llvmgcc supports a
-Wl,-native-cbe option which will use the C backend to create a native
executable for you, and can actually be used to run SPEC successfully.
If you're willing to grab LLVM CVS (details here:
http://llvm.cs.uiuc.edu/docs/GettingStarted.html#checkout ), this is the
best bet.

Thanks! Grabbed the latest from CVS and added that linker option to the
config file. It looks like it compiles and runs the SPEC tests ok now. Just
to make sure I understand how LLVM works, got a few clarifications:

1. The ppc code I'm generating with the -native-cbe is static, correct?
2. Is there a frontend to any of the other gnu compiler collection stuff? Eg
g77 or something for compiling fortran code or is c/c++ the focus right now?
3. Does the code that uses the JIT compiler (I know, doesn't currently exist
on ppc) currently take advantage of run-time optimization? If I ran the SPEC
suite with a JIT compiled version several times, would I except the score to
eventually increase at all, depending on what the benchmark does?

> Is anyone currently working on making it generate native ppc code for OS
> X? If not, what would be involved in doing so? I'm new to LLVM, I'd
> appreciate some pointers to some info that might shed some light on
> that.

There was a group that started working on this, but I believe they got
stalled and are no longer working on it. It will try to get in touch with
them to see where things stand, then get back to you on this.

I only know just enough about compilers to be dangerous, but I'm working on
learning more :). If no one else is currently working on this, what would be
needed in order to get the JIT compiler up and running on ppc/os x? Is it
just a matter of taking some tree/rtl form stuff and emitting ppc code
instead of x86 code? Is there code that needs to be ported to os x so it can
profile itself?

Patrick

Thanks! Grabbed the latest from CVS and added that linker option to the
config file. It looks like it compiles and runs the SPEC tests ok now.

Great!

Just to make sure I understand how LLVM works, got a few clarifications:

1. The ppc code I'm generating with the -native-cbe is static, correct?

Yes, it's purely static with the -native-cbe or -native options.

2. Is there a frontend to any of the other gnu compiler collection
stuff? Eg g77 or something for compiling fortran code or is c/c++ the
focus right now?

So far only C and C++ are currently supported. Objective C support is
extremely close to working, but I haven't had time to work on it at all,
and I don't know objective C very well in any case. :slight_smile: Fortran, Java, and
Ada support has not been tried at all, and are likely to be more work than
objc.

If you are familiar with GCC, adding support for the other GCC front-ends
is not very difficult. It basically amounts to implementing the langhooks
that are used to expand the language-specific tree nodes into the
appropriate LLVM code. Of all of them, having G77 would be particularly
nice, because it would allow us to run the rest of SpecFP, though f2c
might be another option if it works very well (I've never tried it
before).

Long-term we'd like to have a nice F90 front-end, perhaps based on the
Cray front-end. If anyone is interested in Fortran, this would be an
excellent project. :slight_smile:

3. Does the code that uses the JIT compiler (I know, doesn't currently
exist on ppc) currently take advantage of run-time optimization? If I
ran the SPEC suite with a JIT compiled version several times, would I
except the score to eventually increase at all, depending on what the
benchmark does?

Nope, not currently. The ultimate goal is to support this, but right now
it's not implemented.

I only know just enough about compilers to be dangerous, but I'm working
on learning more :). If no one else is currently working on this, what
would be needed in order to get the JIT compiler up and running on
ppc/os x? Is it just a matter of taking some tree/rtl form stuff and
emitting ppc code instead of x86 code?

Basically a new code generator would need to be written. The current X86
code generator (ignoring the experimental instruction selectors) is about
8-9000 LOC (about 4800 SLOC, as reported by sloccount) in the
lib/Target/X86 directory, which should give you an idea of how much work
it currently is to implement a code generator. The X86 backend also has a
bunch of little optimizations in it, so a simple code generator would
probably be a lot less code. We are slowly working on developing tools
that will reduce this amount of code further, but they won't be ready for
some time.

I emailed the guys that started working on the PPC backend. My impression
is that it is close to working, but they ran into some issues with stack
frame layout, and hadn't had time to look into them recently. They are
currently trying to get it cleared through their company to release the
code, which will allow others to continue their work (cross your fingers
:).

Is there code that needs to be ported to os x so it can profile itself?

The only thing that LLVM is missing on OS/X (besides a code generator)
that I'm aware of is support for dynamic plugin loading (Brian would know
more). If you look in lib/Support/DynamicLinker.cpp, you'll see some
stuff that is only defined for HAVE_DLOPEN. It would be very handy to get
support for OS/X, which doesn't have the dl* family of syscalls.

-Chris

Just to make sure I understand how LLVM works, got a few clarifications:

1. The ppc code I'm generating with the -native-cbe is static, correct?

Yes, it's purely static with the -native-cbe or -native options.

Is there anything special flagwise that I would need to specify to tell it to include symbol and debug information? I've tried specifying -g but this information still doesn't seem to be included. A quick copy of the build of one of the tests to make sure I've got the flags right:

Compiling Binaries
   Building 164.gzip ref base ppc32_llvm default
specmake clean 2> make.err | tee make.out
rm -rf gzip gzip.exe *.o core *.err *.out
specmake build 2> make.err | tee make.out
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o bits.o -g -O3 bits.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o deflate.o -g -O3 deflate.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o gzip.o -g -O3 gzip.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o getopt.o -g -O3 getopt.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o inflate.o -g -O3 inflate.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o lzw.o -g -O3 lzw.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o spec.o -g -O3 spec.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o trees.o -g -O3 trees.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o unlzh.o -g -O3 unlzh.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o unlzw.o -g -O3 unlzw.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o unpack.o -g -O3 unpack.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o unzip.o -g -O3 unzip.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o util.o -g -O3 util.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o zip.o -g -O3 zip.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -Wl,-native-cbe -O3 bits.o deflate.o gzip.o getopt.o inflate.o lzw.o spec.o trees.o unlzh.o unlzw.o unpack.o unzip.o util.o zip.o -o gzip
gzip.cbe.c:146: warning: conflicting types for built-in function `memcmp'
gzip.cbe.c:149: warning: conflicting types for built-in function `fprintf'
gzip.cbe.c:181: warning: conflicting types for built-in function `strrchr'
gzip.cbe.c:187: warning: conflicting types for built-in function `memcpy'
gzip.cbe.c:188: warning: conflicting types for built-in function `memset'
specmake options 2> options.err | tee options.out
COMP: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o options.o -g -O3
LINK: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -Wl,-native-cbe -O3 -o options

With the same set of flags, gcc includes debug information. Is there anything else I need to add to get LLVM to do this too?

Thanks,

Patrick

Nope. Right now LLVM doesn't have real support for source-level
debugging. There is a debugger *started*, but it needs substantial work
before it can be usable, and the C front-end cannot produce debug
information yet. If you're interested in the debugger, it is discussed
here:
http://llvm.cs.uiuc.edu/docs/SourceLevelDebugging.html

Sorry!

-Chris

Chris Lattner wrote:

The only thing that LLVM is missing on OS/X (besides a code generator)

that I'm aware of is support for dynamic plugin loading (Brian would know
more). If you look in lib/Support/DynamicLinker.cpp, you'll see some
stuff that is only defined for HAVE_DLOPEN. It would be very handy to get
support for OS/X, which doesn't have the dl* family of syscalls.

Hi Chris,

If I remember well, Panther (Mac OS X 10.3) has a dlopen (see http://www.linuxworld.com.au/index.php?id=660010615&fp=2&fpid=1). Otherwise, it is provided by the "dlcompat" library (http://www.opendarwin.org/projects/dlcompat/), which is very small.

So you can use the dlcompat headers for pre 10.3, and use dlopen for 10.3+.

Cheers,

-- Sébastien

I was able to run through all the C/C++ benchmarks in SPEC using LLVM. I'm on OS X 10.3.3. I did a quick comparison between LLVM (latest from CVS as of 4/27) and gcc 3.3 (Apple's build 20030304). For simplicity's sake, the only flag I used was -O3 for each compiler and I was using the C backend to generate native code for PPC.

Most of the LLVM results were close to gcc performance (within 5%), but a few of the tests caught my eye. 164.gzip ran about 25% slower on my system using LLVM versus gcc. As you said, source level debugging information wasn't available for the LLVM binary but from looking at a profile of the code, there are two functions that take up a moderate amount of time (zip and file_read) in the LLVM binary but these functions are not in the profile of the gcc code. Is it likely that gcc would have inlined these? file_read is relatively small, but zip is a little bigger. I tried to test this theory by manually editing the gzip code to inline those two functions, eg

inline int zip( ...
inline int file_read ( ..

but when I profiled that new code, it still had those two functions in the profile. Does LLVM support inlining (or am I am idiot and tried to do it manually wrong)?

Patrick

I was able to run through all the C/C++ benchmarks in SPEC using LLVM.
I'm on OS X 10.3.3. I did a quick comparison between LLVM (latest from
CVS as of 4/27) and gcc 3.3 (Apple's build 20030304). For simplicity's
sake, the only flag I used was -O3 for each compiler and I was using
the C backend to generate native code for PPC.

Okay, sounds great. Are you using the -native-cbe option? Or are you
running llc -march=c ... and GCC manually?

Most of the LLVM results were close to gcc performance (within 5%), but
a few of the tests caught my eye. 164.gzip ran about 25% slower on my
system using LLVM versus gcc.

Hrm, I really want to figure this out!

As you said, source level debugging information wasn't available for the
LLVM binary but from looking at a profile of the code, there are two
functions that take up a moderate amount of time (zip and file_read) in
the LLVM binary but these functions are not in the profile of the gcc
code. Is it likely that gcc would have inlined these?

It's quite possible. The best way to check is to look at the .s file
produced by GCC and see if they are there. Note that GCC is much more
aggressive abount inlining than LLVM is.

file_read is relatively small, but zip is a little bigger. I tried to
test this theory by manually editing the gzip code to inline those two
functions, eg

inline int zip( ...
inline int file_read ( ..

but when I profiled that new code, it still had those two functions in
the profile. Does LLVM support inlining (or am I am idiot and tried to
do it manually wrong)?

LLVM supports inlining, and you're not an idiot. :slight_smile: The problem is that
LLVM doesn't "listen" to "inline" hints at all right now. If you would
like to adjust the inlining thresholds, you can pass
-Wa,-inline-threshold=XXX or -Wl,-inline-threshold=XXX to set the
compile-time or link-time inlining thresholds, respectively. These both
default to 200 (which has no units), if you increase it, the inliner will
inline more.

If you want to see what inlining decisions are being made, pass
-debug-only=inline (with -Wa, or -Wl,) to see what "choices" the inliner
is making.

Note that, even without source-level debugging information, you can still
do performance investigation with LLVM. You can either look at the C code
generated by the CBE (which will hurt your eyes: brace yourself), or you
can look at the LLVM code directly, which will be easier to handle (once
you get used to reading LLVM).

I suspect that a large reason that LLVM does worst than a native C
compiler with the CBE+GCC is that LLVM generates very low-level C code,
and I'm not convinced that GCC is doing a very good job (ie, without
syntactic loops).

Please let me know what you find!

-Chris

Yup, this is EXACTLY what is going on.

I took this very simple C function:

int Array[1000];
void test(int X) {
  int i;
  for (i = 0; i < 1000; ++i)
    Array[i] += X;
}

Compile with -O3 on OS/X gave me this:

_test:
        mflr r5
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        mflr r2
        mtlr r5
        addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb")
        li r2,0
        lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4)
        li r4,1000
        mtctr r4
L9:
        lwzx r7,r2,r9 ; load
        add r6,r7,r3 ; add
        stwx r6,r2,r9 ; store
        addi r2,r2,4 ; Increment pointer
        bdnz L9 ; Decrement count register, branch while not zero
        blr

This is nice code, good GCC. :slight_smile:

Okay, LLVM currently generates this code from the CBE:

void test(int l7_X) {
  unsigned l8_indvar;
  unsigned l8_indvar__PHI_TEMPORARY;
  int *l14_tmp_2E_5;
  int l7_tmp_2E_9;
  unsigned l8_indvar_2E_next;

  l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */

l13_no_exit:
  l8_indvar = l8_indvar__PHI_TEMPORARY;
  l14_tmp_2E_5 = &Array[l8_indvar];
  l7_tmp_2E_9 = *l14_tmp_2E_5;
  *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X);
  l8_indvar_2E_next = l8_indvar + 1u;
  if (!(l8_indvar_2E_next == 1000u)) {
    l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */
    goto l13_no_exit;
  }
  return;
}

This has exactly the same operations in the loop, so GCC should produce
the same code, right? Wrong:

_test:
        mflr r4
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        mflr r2
        mtlr r4
        li r11,0
        addis r10,r2,ha16(_Array-"L00000000001$pb")
L2:
        slwi r2,r11,2 ; Shift left "i" by 2
        la r5,lo16(_Array-"L00000000001$pb")(r10)
        cmpwi cr0,r11,999 ; compare i to the trip count
        lwzx r7,r2,r5 ; Load from array
        addi r11,r11,1 ; increment "i"
        add r6,r7,r3 ; Add value to array value
        stwx r6,r2,r5 ; store into array
        bne+ cr0,L2 ; Loop until done
        blr

Hrm, basically gcc is not doing ANY loop optimization (e.g.
strength reduction or "do-loop" optimization) what-so-ever. I'm sure that
the X86 GCC is suffering from the same problems, it's just that X86
doesn't depend on strength reduction and do-loop optimization as much, so
it's not so pronounced.

Interestingly, if I tweak the .cbe code to be this:

  do {
  l8_indvar = l8_indvar__PHI_TEMPORARY;
  l14_tmp_2E_5 = &Array[l8_indvar];
  l7_tmp_2E_9 = *l14_tmp_2E_5;
  *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X);
  l8_indvar_2E_next = l8_indvar + 1u;
  l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */
  } while (!(l8_indvar_2E_next == 1000u));

GCC generates the nice code again, virtually identical to the code from
the original source. AAAH! :slight_smile:

Maybe this is a good argument for making the CBE generate syntactic loops
in simple cases. I may have some time to try implementing this on the
weekend. That is, if no one beats me to it. :slight_smile:

-Chris

Interesting. Now that you mention it, I do recall thinking the loops that llvm generated looked a bit different than the gcc loops. I'll go back and take another look, but this might explain some of that discrepancy.

In response to the other email:

I'm using the -native-cbe option to generate the code. From your last email, it sounds like when you specify this option, it compiles everything to the llvm code, then the CBE generates C code based on the generated llvm code, and THAT is what is compiled to native code rather than the original code itself? That was one of the other things I was going to get around to asking about eventually, it seems like llvm takes an eternity and a half to link these programs, but if this "linking" is really llvm code -> c code -> compile & link again, that would explain why it takes so long.

I have to confess I'm not as familiar with gcc as I'd like to be. Where would gcc put the .s file (or what flags do I have to specify to create one?) Also, where would llvm put the llvm code and the C code that the backend generates (or what do I need to specify to tell it to keep that around)?

I took a look at the inlining decisions that llvm prints out when you specify -Wl,-debug-only=inline. Let me make sure I understand how to interpret this output correctly:

Inliner visiting SCC: .gen_codes_26
   Inspecting function: .gen_codes_26
     Inlining: cost=100, Call: %tmp.37 = call uint %bi_reverse( uint %tmp.42, int %tmp.27 ) ; <uint> [#uses=1]
Inliner visiting SCC: .pqdownheap_35
   Inspecting function: .pqdownheap_35
Inliner visiting SCC: .build_tree_41
   Inspecting function: .build_tree_41
     NOT Inlining: cost=501, Call: call void %.pqdownheap_35( %struct.ct_data* %tmp.2, int %n.1.0 )
     NOT Inlining: cost=466, Call: call void %.pqdownheap_35( %struct.ct_data* %tmp.2, int 1 )
     NOT Inlining: cost=466, Call: call void %.pqdownheap_35( %struct.ct_data* %tmp.2, int 1 )
     NOT Inlining: cost=406, Call: call void %.gen_codes_26( %struct.ct_data* %tmp.2, int %max_code.1.0 )

So it looks at each function to try to determine if it should be inlined, comes up with a "cost" to inline it based on what it takes as parameters and how often its called, and if this cost is below the threshold specified then it inlines it? What about this build_tree function from the log? It says multiple times its not inlined. Is the decisions whether to inline it or not made for the function as a whole (eg always inline or always don't) or is it decided on a call by call basis? What does it mean for something like pqdownheap where it doesn't give a cost with a yea or nay?

Also, on an unrelated note, I could have sworn all those benchmarks compiled but I went back to double check and I saw that there were a few problems.

253.perlbmk builds fine but crashes when running through the spec test. I recall seeing a note a few places on the website that said perlbmk didn't work properly due to a longjmp bug, is that still a known bug or should I try running that through a debugger to find the problem?

176.gcc generates an ICE when trying to compile with llvm and -O3. Here's the build log, are there any other files that might shed more light on this problem?

Patrick

We will use: 176.gcc
Compiling Binaries
   Building 176.gcc ref base ppc32_llvm default
specmake clean 2> make.err | tee make.out
rm -rf cc1 cc1.exe *.o core *.err *.out
specmake build 2> make.err | tee make.out
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-parse.o -DHOST_WORDS_BIG_ENDIAN -O3 c-parse.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-lang.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lang.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-lex.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lex.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-pragma.o -DHOST_WORDS_BIG_ENDIAN -O3 c-pragma.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-decl.o -DHOST_WORDS_BIG_ENDIAN -O3 c-decl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-typeck.o -DHOST_WORDS_BIG_ENDIAN -O3 c-typeck.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-convert.o -DHOST_WORDS_BIG_ENDIAN -O3 c-convert.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-aux-info.o -DHOST_WORDS_BIG_ENDIAN -O3 c-aux-info.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-common.o -DHOST_WORDS_BIG_ENDIAN -O3 c-common.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o c-iterate.o -DHOST_WORDS_BIG_ENDIAN -O3 c-iterate.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o toplev.o -DHOST_WORDS_BIG_ENDIAN -O3 toplev.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o version.o -DHOST_WORDS_BIG_ENDIAN -O3 version.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o tree.o -DHOST_WORDS_BIG_ENDIAN -O3 tree.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o print-tree.o -DHOST_WORDS_BIG_ENDIAN -O3 print-tree.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stor-layout.o -DHOST_WORDS_BIG_ENDIAN -O3 stor-layout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o fold-const.o -DHOST_WORDS_BIG_ENDIAN -O3 fold-const.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o function.o -DHOST_WORDS_BIG_ENDIAN -O3 function.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stmt.o -DHOST_WORDS_BIG_ENDIAN -O3 stmt.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o expr.o -DHOST_WORDS_BIG_ENDIAN -O3 expr.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o calls.o -DHOST_WORDS_BIG_ENDIAN -O3 calls.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o expmed.o -DHOST_WORDS_BIG_ENDIAN -O3 expmed.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o explow.o -DHOST_WORDS_BIG_ENDIAN -O3 explow.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o optabs.o -DHOST_WORDS_BIG_ENDIAN -O3 optabs.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o varasm.o -DHOST_WORDS_BIG_ENDIAN -O3 varasm.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 rtl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o print-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 print-rtl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o rtlanal.o -DHOST_WORDS_BIG_ENDIAN -O3 rtlanal.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o emit-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 emit-rtl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o real.o -DHOST_WORDS_BIG_ENDIAN -O3 real.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o dbxout.o -DHOST_WORDS_BIG_ENDIAN -O3 dbxout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o sdbout.o -DHOST_WORDS_BIG_ENDIAN -O3 sdbout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o dwarfout.o -DHOST_WORDS_BIG_ENDIAN -O3 dwarfout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o xcoffout.o -DHOST_WORDS_BIG_ENDIAN -O3 xcoffout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o integrate.o -DHOST_WORDS_BIG_ENDIAN -O3 integrate.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o jump.o -DHOST_WORDS_BIG_ENDIAN -O3 jump.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o cse.o -DHOST_WORDS_BIG_ENDIAN -O3 cse.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o loop.o -DHOST_WORDS_BIG_ENDIAN -O3 loop.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o unroll.o -DHOST_WORDS_BIG_ENDIAN -O3 unroll.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o flow.o -DHOST_WORDS_BIG_ENDIAN -O3 flow.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stupid.o -DHOST_WORDS_BIG_ENDIAN -O3 stupid.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o combine.o -DHOST_WORDS_BIG_ENDIAN -O3 combine.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o regclass.o -DHOST_WORDS_BIG_ENDIAN -O3 regclass.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o local-alloc.o -DHOST_WORDS_BIG_ENDIAN -O3 local-alloc.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o global.o -DHOST_WORDS_BIG_ENDIAN -O3 global.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reload.o -DHOST_WORDS_BIG_ENDIAN -O3 reload.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reload1.o -DHOST_WORDS_BIG_ENDIAN -O3 reload1.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o caller-save.o -DHOST_WORDS_BIG_ENDIAN -O3 caller-save.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-peep.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-peep.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reorg.o -DHOST_WORDS_BIG_ENDIAN -O3 reorg.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o sched.o -DHOST_WORDS_BIG_ENDIAN -O3 sched.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o final.o -DHOST_WORDS_BIG_ENDIAN -O3 final.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o recog.o -DHOST_WORDS_BIG_ENDIAN -O3 recog.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o reg-stack.o -DHOST_WORDS_BIG_ENDIAN -O3 reg-stack.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-opinit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-opinit.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-recog.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-recog.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-extract.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-extract.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-output.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-output.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-emit.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o insn-attrtab.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-attrtab.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o m88k.o -DHOST_WORDS_BIG_ENDIAN -O3 m88k.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o getpwd.o -DHOST_WORDS_BIG_ENDIAN -O3 getpwd.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o convert.o -DHOST_WORDS_BIG_ENDIAN -O3 convert.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o bc-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-emit.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o bc-optab.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-optab.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o obstack.o -DHOST_WORDS_BIG_ENDIAN -O3 obstack.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -Wl,-native-cbe -O3 c-parse.o c-lang.o c-lex.o c-pragma.o c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-iterate.o toplev.o version.o tree.o print-tree.o stor-layout.o fold-const.o function.o stmt.o expr.o calls.o expmed.o explow.o optabs.o varasm.o rtl.o print-rtl.o rtlanal.o emit-rtl.o real.o dbxout.o sdbout.o dwarfout.o xcoffout.o integrate.o jump.o cse.o loop.o unroll.o flow.o stupid.o combine.o regclass.o local-alloc.o global.o reload.o reload1.o caller-save.o insn-peep.o reorg.o sched.o final.o recog.o reg-stack.o insn-opinit.o insn-recog.o insn-extract.o insn-output.o insn-emit.o insn-attrtab.o m88k.o getpwd.o convert.o bc-emit.o bc-optab.o obstack.o -lm -o cc1
WARNING: While resolving call to function '.plain_type_6' arguments were dropped!
WARNING: While resolving call to function '.plain_type_6' arguments were dropped!
WARNING: While resolving call to function '.plain_type_6' arguments were dropped!
combine.c: In function `find_split_point':

combine.c:2443: warning: function returns address of local variable
combine.c:2509: warning: function returns address of local variable
combine.c:2683: warning: function returns address of local variable
combine.c:2695: warning: function returns address of local variable
combine.c:2747: warning: function returns address of local variable
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Found global types that are not compatible:
          %struct.rtx_def* (%struct.increment_operator*, %union.tree_node*)* %bc_expand_increment
          void (%struct.increment_operator*, %union.tree_node*)* %bc_expand_increment
WARNING: Found global types that are not compatible:
          int (...)* %bc_xstrdup
          sbyte* (sbyte*)* %bc_xstrdup
WARNING: Found global types that are not compatible:
          void (...)* %dump_flow_info
          void (%struct.__sFILE*)* %dump_flow_info
          int (...)* %dump_flow_info
WARNING: Found global types that are not compatible:
          int (...)* %expand_expr
          %struct.rtx_def* (...)* %expand_expr
          %struct.rtx_def* (%union.tree_node*, %struct.rtx_def*, uint, uint)* %expand_expr
WARNING: Found global types that are not compatible:
          %struct.function* (%union.tree_node*)* %find_function_data
          { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int, int, int, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, int, int, %struct.rtx_def*, int, int, int, %struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, %struct.rtx_def*, %union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, { %struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int, %struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*, int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, int, int, %struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int, int, %struct.machine_function*, %struct.rtx_def*, %struct.constant_descriptor**, %struct.pool_sym**, %struct.pool_constant*, %struct.pool_constant*, int }* (%union.tree_node*)* %find_function_data
WARNING: Found global types that are not compatible:
          %struct.function** %outer_function_chain
          { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int, int, int, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, int, int, %struct.rtx_def*, int, int, int, %struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, %struct.rtx_def*, %union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, { %struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*, int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int, %struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*, %struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*, int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, %union.tree_node*, int, int, %struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int, int, %struct.machine_function*, %struct.rtx_def*, %struct.constant_descriptor**, %struct.pool_sym**, %struct.pool_constant*, %struct.pool_constant*, int }** %outer_function_chain
WARNING: While resolving call to function 'gen_call_value' arguments were dropped!
WARNING: While resolving call to function 'gen_call' arguments were dropped!
WARNING: While resolving call to function 'gen_call_value' arguments were dropped!
cc1.cbe.c:1560: warning: conflicting types for built-in function `fprintf'
cc1.cbe.c:1640: warning: conflicting types for built-in function `sprintf'
cc1.cbe.c:1757: warning: conflicting types for built-in function `strncmp'
cc1.cbe.c:1763: warning: conflicting types for built-in function `strchr'
cc1.cbe.c:1846: warning: conflicting types for built-in function `memcmp'
cc1.cbe.c:2321: warning: conflicting types for built-in function `strrchr'
cc1.cbe.c:3048: warning: conflicting types for built-in function `memcpy'
cc1.cbe.c:3049: warning: conflicting types for built-in function `memset'
cc1.cbe.c: In function `l2493_recog_5':
cc1.cbe.c:607726: internal compiler error: in final_scan_insn, at final.c:2189
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
specmake options 2> options.err | tee options.out
COMP: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o options.o -DHOST_WORDS_BIG_ENDIAN -O3
LINK: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -Wl,-native-cbe -O3 -lm -o options
   Some files did not appear to be built: cc1
*** Error building 176.gcc

>> and I'm not convinced that GCC is doing a very good job (ie, without
>> syntactic loops).
>
> Yup, this is EXACTLY what is going on.

Interesting. Now that you mention it, I do recall thinking the loops
that llvm generated looked a bit different than the gcc loops. I'll go
back and take another look, but this might explain some of that
discrepancy.

I'll try to put together a "solution" for this some time in the near
future. Since right now we depend on the CBE for performance, and because
so many people use GCC, we really are REQUIRED to cover for this if we
want to provide competitive performance. I imagine that this should
improve loop-intensive codes substantially.

In response to the other email:

I'm using the -native-cbe option to generate the code. From your last
email, it sounds like when you specify this option, it compiles
everything to the llvm code, then the CBE generates C code based on the
generated llvm code, and THAT is what is compiled to native code rather
than the original code itself?

Yes, that's exactly right.

That was one of the other things I was going to get around to asking
about eventually, it seems like llvm takes an eternity and a half to
link these programs, but if this "linking" is really llvm code -> c code
-> compile & link again, that would explain why it takes so long.

Yup, that's what's going on. Until we have a native code generator for
PPC, it will have to stay that way unfortunately.

I have to confess I'm not as familiar with gcc as I'd like to be. Where
would gcc put the .s file (or what flags do I have to specify to create
one?)

If you're using the -native-cbe option, the .s file is removed, and there
isn't a flag to preserve it. The way to do this is to CBE the .bc file
(which should be compiled in parallel with the native program), and
compile manually with GCC. i.e.:

$ llc -march=c foo.bc -o foo.cbe.c
$ gcc -O3 foo.cbe.c -S
$ less foo.cbe.s

Also, where would llvm put the llvm code and the C code that the
backend generates (or what do I need to specify to tell it to keep that
around)?

The only way to get that is to use the commands above on the bytecode
file. The LLVM bytecode file should be generated in parallel with the
native executable, adding a .bc suffix to it.

I took a look at the inlining decisions that llvm prints out when you
specify -Wl,-debug-only=inline. Let me make sure I understand how to
interpret this output correctly:

Inliner visiting SCC: .gen_codes_26
   Inspecting function: .gen_codes_26
     Inlining: cost=100, Call: %tmp.37 = call uint %bi_reverse( uint
%tmp.42, int %tmp.27 ) ; <uint> [#uses=1]
Inliner visiting SCC: .pqdownheap_35
   Inspecting function: .pqdownheap_35
Inliner visiting SCC: .build_tree_41
   Inspecting function: .build_tree_41
     NOT Inlining: cost=501, Call: call void %.pqdownheap_35(
%struct.ct_data* %tmp.2, int %n.1.0 )
     NOT Inlining: cost=466, Call: call void %.pqdownheap_35(
%struct.ct_data* %tmp.2, int 1 )
     NOT Inlining: cost=466, Call: call void %.pqdownheap_35(
%struct.ct_data* %tmp.2, int 1 )
     NOT Inlining: cost=406, Call: call void %.gen_codes_26(
%struct.ct_data* %tmp.2, int %max_code.1.0 )

So it looks at each function to try to determine if it should be
inlined, comes up with a "cost" to inline it based on what it takes as
parameters and how often its called, and if this cost is below the
threshold specified then it inlines it?

Yes, basically.

What about this build_tree function from the log? It says multiple times
its not inlined. Is the decisions whether to inline it or not made for
the function as a whole (eg always inline or always don't) or is it
decided on a call by call basis? What does it mean for something like
pqdownheap where it doesn't give a cost with a yea or nay?

It does this on a call-site by call-site basis. It appears that
build_tree is calling pqdownheap 3 times. In two of those, it passes a
constant one as the second argument. The inliner is looking into the
function and deciding that it could simplify the resultant code a bit
because of the constant, thus the call to call pqdownheap for the second
2 calls is less than the cost to inline the first.

The "inspecting" and "visiting" lines indicate the function that the
inliner is looking at (i.e., it is inspecting calls inside of that
function). The inliner works in a "bottom-up" fashion, inlining the
leaves of the call graph before it attempts to inliner the roots.

Also, on an unrelated note, I could have sworn all those benchmarks
compiled but I went back to double check and I saw that there were a
few problems.

Okay.

253.perlbmk builds fine but crashes when running through the spec test.
I recall seeing a note a few places on the website that said perlbmk
didn't work properly due to a longjmp bug, is that still a known bug or
should I try running that through a debugger to find the problem?

Hrm, that's an interesting question. To get setjmp/longjmp working, you
have to pass -enable-correct-eh-support into the 'llc' command. I don't
think there is any way to get the -native-cbe option to do this, so if you
want to test it, you'll have to use the commands above to compile it by
hand. :frowning:

176.gcc generates an ICE when trying to compile with llvm and -O3.
Here's the build log, are there any other files that might shed more
light on this problem?

Hrm, I have no idea. That's a bug in GCC or Apple's modification to GCC.
You might try filing a bug with them (providing the .cbe.c file) they
might be able to describe a work-around.

-Chris

Patrick

We will use: 176.gcc
Compiling Binaries
   Building 176.gcc ref base ppc32_llvm default
specmake clean 2> make.err | tee make.out
rm -rf cc1 cc1.exe *.o core *.err *.out
specmake build 2> make.err | tee make.out
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-parse.o -DHOST_WORDS_BIG_ENDIAN -O3 c-parse.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-lang.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lang.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-lex.o -DHOST_WORDS_BIG_ENDIAN -O3 c-lex.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-pragma.o -DHOST_WORDS_BIG_ENDIAN -O3 c-pragma.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-decl.o -DHOST_WORDS_BIG_ENDIAN -O3 c-decl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-typeck.o -DHOST_WORDS_BIG_ENDIAN -O3 c-typeck.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-convert.o -DHOST_WORDS_BIG_ENDIAN -O3 c-convert.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-aux-info.o -DHOST_WORDS_BIG_ENDIAN -O3 c-aux-info.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-common.o -DHOST_WORDS_BIG_ENDIAN -O3 c-common.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
c-iterate.o -DHOST_WORDS_BIG_ENDIAN -O3 c-iterate.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
toplev.o -DHOST_WORDS_BIG_ENDIAN -O3 toplev.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
version.o -DHOST_WORDS_BIG_ENDIAN -O3 version.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o tree.o
  -DHOST_WORDS_BIG_ENDIAN -O3 tree.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
print-tree.o -DHOST_WORDS_BIG_ENDIAN -O3 print-tree.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
stor-layout.o -DHOST_WORDS_BIG_ENDIAN -O3 stor-layout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
fold-const.o -DHOST_WORDS_BIG_ENDIAN -O3 fold-const.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
function.o -DHOST_WORDS_BIG_ENDIAN -O3 function.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o stmt.o
  -DHOST_WORDS_BIG_ENDIAN -O3 stmt.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o expr.o
  -DHOST_WORDS_BIG_ENDIAN -O3 expr.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
calls.o -DHOST_WORDS_BIG_ENDIAN -O3 calls.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
expmed.o -DHOST_WORDS_BIG_ENDIAN -O3 expmed.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
explow.o -DHOST_WORDS_BIG_ENDIAN -O3 explow.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
optabs.o -DHOST_WORDS_BIG_ENDIAN -O3 optabs.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
varasm.o -DHOST_WORDS_BIG_ENDIAN -O3 varasm.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o rtl.o
-DHOST_WORDS_BIG_ENDIAN -O3 rtl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
print-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 print-rtl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
rtlanal.o -DHOST_WORDS_BIG_ENDIAN -O3 rtlanal.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
emit-rtl.o -DHOST_WORDS_BIG_ENDIAN -O3 emit-rtl.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o real.o
  -DHOST_WORDS_BIG_ENDIAN -O3 real.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
dbxout.o -DHOST_WORDS_BIG_ENDIAN -O3 dbxout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
sdbout.o -DHOST_WORDS_BIG_ENDIAN -O3 sdbout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
dwarfout.o -DHOST_WORDS_BIG_ENDIAN -O3 dwarfout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
xcoffout.o -DHOST_WORDS_BIG_ENDIAN -O3 xcoffout.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
integrate.o -DHOST_WORDS_BIG_ENDIAN -O3 integrate.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o jump.o
  -DHOST_WORDS_BIG_ENDIAN -O3 jump.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o cse.o
-DHOST_WORDS_BIG_ENDIAN -O3 cse.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o loop.o
  -DHOST_WORDS_BIG_ENDIAN -O3 loop.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
unroll.o -DHOST_WORDS_BIG_ENDIAN -O3 unroll.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o flow.o
  -DHOST_WORDS_BIG_ENDIAN -O3 flow.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
stupid.o -DHOST_WORDS_BIG_ENDIAN -O3 stupid.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
combine.o -DHOST_WORDS_BIG_ENDIAN -O3 combine.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
regclass.o -DHOST_WORDS_BIG_ENDIAN -O3 regclass.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
local-alloc.o -DHOST_WORDS_BIG_ENDIAN -O3 local-alloc.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
global.o -DHOST_WORDS_BIG_ENDIAN -O3 global.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
reload.o -DHOST_WORDS_BIG_ENDIAN -O3 reload.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
reload1.o -DHOST_WORDS_BIG_ENDIAN -O3 reload1.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
caller-save.o -DHOST_WORDS_BIG_ENDIAN -O3 caller-save.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-peep.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-peep.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
reorg.o -DHOST_WORDS_BIG_ENDIAN -O3 reorg.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
sched.o -DHOST_WORDS_BIG_ENDIAN -O3 sched.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
final.o -DHOST_WORDS_BIG_ENDIAN -O3 final.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
recog.o -DHOST_WORDS_BIG_ENDIAN -O3 recog.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
reg-stack.o -DHOST_WORDS_BIG_ENDIAN -O3 reg-stack.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-opinit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-opinit.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-recog.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-recog.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-extract.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-extract.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-output.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-output.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-emit.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
insn-attrtab.o -DHOST_WORDS_BIG_ENDIAN -O3 insn-attrtab.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o m88k.o
  -DHOST_WORDS_BIG_ENDIAN -O3 m88k.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
getpwd.o -DHOST_WORDS_BIG_ENDIAN -O3 getpwd.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
convert.o -DHOST_WORDS_BIG_ENDIAN -O3 convert.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
bc-emit.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-emit.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
bc-optab.o -DHOST_WORDS_BIG_ENDIAN -O3 bc-optab.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
obstack.o -DHOST_WORDS_BIG_ENDIAN -O3 obstack.c
/Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc
-Wl,-native-cbe -O3 c-parse.o c-lang.o c-lex.o c-pragma.o
c-decl.o c-typeck.o c-convert.o c-aux-info.o c-common.o c-iterate.o
toplev.o version.o tree.o print-tree.o stor-layout.o fold-const.o
function.o stmt.o expr.o calls.o expmed.o explow.o optabs.o varasm.o
rtl.o print-rtl.o rtlanal.o emit-rtl.o real.o dbxout.o sdbout.o
dwarfout.o xcoffout.o integrate.o jump.o cse.o loop.o unroll.o flow.o
stupid.o combine.o regclass.o local-alloc.o global.o reload.o reload1.o
caller-save.o insn-peep.o reorg.o sched.o final.o recog.o reg-stack.o
insn-opinit.o insn-recog.o insn-extract.o insn-output.o insn-emit.o
insn-attrtab.o m88k.o getpwd.o convert.o bc-emit.o bc-optab.o obstack.o
   -lm -o cc1
WARNING: While resolving call to function '.plain_type_6' arguments
were dropped!
WARNING: While resolving call to function '.plain_type_6' arguments
were dropped!
WARNING: While resolving call to function '.plain_type_6' arguments
were dropped!
combine.c: In function `find_split_point':

combine.c:2443: warning: function returns address of local variable
combine.c:2509: warning: function returns address of local variable
combine.c:2683: warning: function returns address of local variable
combine.c:2695: warning: function returns address of local variable
combine.c:2747: warning: function returns address of local variable
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Type conflict between types named 'struct.var_refs_queue'.
     Src=' %struct.var_refs_queue'.
    Dest=' %struct.var_refs_queue'
WARNING: Type conflict between types named 'struct.sequence_stack'.
     Src=' %struct.sequence_stack'.
    Dest=' %struct.sequence_stack'
WARNING: Type conflict between types named 'struct.function'.
     Src=' %struct.function'.
    Dest=' %struct.function'
WARNING: Found global types that are not compatible:
          %struct.rtx_def* (%struct.increment_operator*,
%union.tree_node*)* %bc_expand_increment
          void (%struct.increment_operator*, %union.tree_node*)*
%bc_expand_increment
WARNING: Found global types that are not compatible:
          int (...)* %bc_xstrdup
          sbyte* (sbyte*)* %bc_xstrdup
WARNING: Found global types that are not compatible:
          void (...)* %dump_flow_info
          void (%struct.__sFILE*)* %dump_flow_info
          int (...)* %dump_flow_info
WARNING: Found global types that are not compatible:
          int (...)* %expand_expr
          %struct.rtx_def* (...)* %expand_expr
          %struct.rtx_def* (%union.tree_node*, %struct.rtx_def*, uint,
uint)* %expand_expr
WARNING: Found global types that are not compatible:
          %struct.function* (%union.tree_node*)* %find_function_data
          { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int,
int, int, int, int, %struct.rtx_def*, %struct.rtx_def*,
%union.tree_node*, int, int, %struct.rtx_def*, int, int, int,
%struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*,
%struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*,
int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*,
%struct.rtx_def*, %union.tree_node*, %struct.rtx_def*,
%union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, {
%struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*,
%struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*,
int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int,
%struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*,
%struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*,
%struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*,
%struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*,
int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*,
%union.tree_node*, %union.tree_node*, %union.tree_node*, int, int,
%struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*,
%struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*,
%struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int,
int, %struct.machine_function*, %struct.rtx_def*,
%struct.constant_descriptor**, %struct.pool_sym**,
%struct.pool_constant*, %struct.pool_constant*, int }*
(%union.tree_node*)* %find_function_data
WARNING: Found global types that are not compatible:
          %struct.function** %outer_function_chain
          { \2, sbyte*, %union.tree_node*, int, int, int, int, int, int,
int, int, int, int, %struct.rtx_def*, %struct.rtx_def*,
%union.tree_node*, int, int, %struct.rtx_def*, int, int, int,
%struct.rtx_def**, int, %struct.rtx_def*, %struct.rtx_def*,
%struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*,
int, %struct.rtx_def*, %struct.rtx_def*, %struct.rtx_def*,
%struct.rtx_def*, %union.tree_node*, %struct.rtx_def*,
%union.tree_node*, %union.tree_node*, int, %struct.temp_slot*, int, {
%struct.rtx_def*, uint, int, \2 }*, %struct.nesting*, %struct.nesting*,
%struct.nesting*, %struct.nesting*, %struct.nesting*, %struct.nesting*,
int, int, %union.tree_node*, %struct.rtx_def*, int, sbyte*, int,
%struct.goto_fixup*, int, int, %union.tree_node*, %struct.rtx_def*,
%struct.rtx_def*, %struct.rtx_def*, int, int, %struct.rtx_def*,
%struct.rtx_def*, %union.tree_node*, { %struct.rtx_def*,
%struct.rtx_def*, %union.tree_node*, \2 }*, int, int, sbyte*, sbyte*,
int, %struct.rtx_def**, %union.tree_node*, %union.tree_node*,
%union.tree_node*, %union.tree_node*, %union.tree_node*, int, int,
%struct.momentary_level*, sbyte*, sbyte*, sbyte*, sbyte*,
%struct.obstack*, %struct.obstack*, %struct.obstack*, %struct.obstack*,
%struct.obstack*, %struct.obstack*, %struct.simple_obstack_stack*, int,
int, %struct.machine_function*, %struct.rtx_def*,
%struct.constant_descriptor**, %struct.pool_sym**,
%struct.pool_constant*, %struct.pool_constant*, int }**
%outer_function_chain
WARNING: While resolving call to function 'gen_call_value' arguments
were dropped!
WARNING: While resolving call to function 'gen_call' arguments were
dropped!
WARNING: While resolving call to function 'gen_call_value' arguments
were dropped!
cc1.cbe.c:1560: warning: conflicting types for built-in function
`fprintf'
cc1.cbe.c:1640: warning: conflicting types for built-in function
`sprintf'
cc1.cbe.c:1757: warning: conflicting types for built-in function
`strncmp'
cc1.cbe.c:1763: warning: conflicting types for built-in function
`strchr'
cc1.cbe.c:1846: warning: conflicting types for built-in function
`memcmp'
cc1.cbe.c:2321: warning: conflicting types for built-in function
`strrchr'
cc1.cbe.c:3048: warning: conflicting types for built-in function
`memcpy'
cc1.cbe.c:3049: warning: conflicting types for built-in function
`memset'
cc1.cbe.c: In function `l2493_recog_5':
cc1.cbe.c:607726: internal compiler error: in final_scan_insn, at
final.c:2189
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
specmake options 2> options.err | tee options.out
COMP: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc -c -o
options.o -DHOST_WORDS_BIG_ENDIAN -O3
LINK: /Users/patrick/Desktop/LLVM/cfrontend/ppc/llvm-gcc/bin/gcc
-Wl,-native-cbe -O3 -lm -o options
   Some files did not appear to be built: cc1
*** Error building 176.gcc

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

and I'm not convinced that GCC is doing a very good job (ie, without
syntactic loops).

Yup, this is EXACTLY what is going on.

Interesting. Now that you mention it, I do recall thinking the loops
that llvm generated looked a bit different than the gcc loops. I'll go
back and take another look, but this might explain some of that
discrepancy.

I'll try to put together a "solution" for this some time in the near
future. Since right now we depend on the CBE for performance, and because
so many people use GCC, we really are REQUIRED to cover for this if we
want to provide competitive performance. I imagine that this should
improve loop-intensive codes substantially.

What is the rationale for doing the llvm code -> c backend code -> compile that thing anyway? If the person specifies that -native-cbe, why not just compile the code as is?

inlining stuff

I did play around with the inlining threshold to make file_read and zip get inlined in more spots. This didn't make any difference though, so the main problem seems to be the loops as you pointed out in last email. Wouldn't this be a non-issue if the -native-cbe flag just meant compile the code given rather than going through llvm code and the c backend?

Also, on an unrelated note, I could have sworn all those benchmarks
compiled but I went back to double check and I saw that there were a
few problems.

Okay.

253.perlbmk builds fine but crashes when running through the spec test.
I recall seeing a note a few places on the website that said perlbmk
didn't work properly due to a longjmp bug, is that still a known bug or
should I try running that through a debugger to find the problem?

Hrm, that's an interesting question. To get setjmp/longjmp working, you
have to pass -enable-correct-eh-support into the 'llc' command. I don't
think there is any way to get the -native-cbe option to do this, so if you
want to test it, you'll have to use the commands above to compile it by
hand. :frowning:

I built perlbmk by manually and specified the -enable-correct-eh-support option. Looks like that test is working fine now.

Patrick

>> Interesting. Now that you mention it, I do recall thinking the loops
>> that llvm generated looked a bit different than the gcc loops. I'll go
>> back and take another look, but this might explain some of that
>> discrepancy.
>
> I'll try to put together a "solution" for this some time in the near
> future. Since right now we depend on the CBE for performance, and
> because
> so many people use GCC, we really are REQUIRED to cover for this if we
> want to provide competitive performance. I imagine that this should
> improve loop-intensive codes substantially.

What is the rationale for doing the llvm code -> c backend code ->
compile that thing anyway? If the person specifies that -native-cbe,
why not just compile the code as is?

I'm not sure I understand what you mean? Why not just compile it with GCC
natively?

>> inlining stuff

I did play around with the inlining threshold to make file_read and zip
get inlined in more spots. This didn't make any difference though, so
the main problem seems to be the loops as you pointed out in last
email. Wouldn't this be a non-issue if the -native-cbe flag just meant
compile the code given rather than going through llvm code and the c
backend?

Well the whole idea of LLVM is to lots of interesting interprocedural
optimizations (which GCC can not do). If you look at the X86 world where
this loop issue is less of a problem, we can speed up some programs (like
179.art) by about 2x over native GCC, and can get 20-40% speedups on quite
a few programs. Also, we are only starting to add some of the important
optimizations and analyses that will really make LLVM shine.

The problem is just that our native code generators either don't exist
(for PPC) or are not yet quite competitive (on X86), though we are making
progress and getting closer all of the time.

I must say that I would MUCH rather work on making LLVM better than to add
hacks to work around GCC's deficiencies (the Intel compiler has no problem
groking the LLVM produced C code, for example), but there is no way we
will get decent PPC performance until we have a native code generator for
it or can convince GCC to do reasonable things.

-Chris

> I suspect that a large reason that LLVM does worst than a native C
> compiler with the CBE+GCC is that LLVM generates very low-level C code,
> and I'm not convinced that GCC is doing a very good job (ie, without
> syntactic loops).

Yup, this is EXACTLY what is going on.

I took this very simple C function:

int Array[1000];
void test(int X) {
  int i;
  for (i = 0; i < 1000; ++i)
    Array[i] += X;
}

Compile with -O3 on OS/X gave me this:

_test:
        mflr r5
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        mflr r2
        mtlr r5
        addis r4,r2,ha16(L_Array$non_lazy_ptr-"L00000000001$pb")
        li r2,0
        lwz r9,lo16(L_Array$non_lazy_ptr-"L00000000001$pb")(r4)
        li r4,1000
        mtctr r4
L9:
        lwzx r7,r2,r9 ; load
        add r6,r7,r3 ; add
        stwx r6,r2,r9 ; store
        addi r2,r2,4 ; Increment pointer
        bdnz L9 ; Decrement count register, branch while not zero
        blr

This is nice code, good GCC. :slight_smile:

Okay, I changed the C backend to emit syntactic loops around the real
loops, and it seems to make a big difference. LLVM now generates this
code (note that the actual loop is not actually responsible for control
flow, it's unreachable):

void test(int l7_X) {
  unsigned l8_indvar;
  unsigned l8_indvar__PHI_TEMPORARY;
  int *l14_tmp_2E_4;
  int l7_tmp_2E_7;
  unsigned l8_indvar_2E_next;

  l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */
  goto l13_no_exit;

  do { /* Syntactic loop 'no_exit' to make GCC happy */
l13_no_exit:
  l8_indvar = l8_indvar__PHI_TEMPORARY;
  l14_tmp_2E_4 = &Array[l8_indvar];
  l7_tmp_2E_7 = *l14_tmp_2E_4;
  *l14_tmp_2E_4 = (l7_tmp_2E_7 + l7_X);
  l8_indvar_2E_next = l8_indvar + 1u;
  l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */
  if ((l8_indvar_2E_next == 1000u)) {
    goto l13_return;
  } else {
    goto l13_no_exit;
  }

  } while (1); /* end of syntactic loop 'no_exit' */
l13_return:
  return;
}

Instead of:

void test(int l7_X) {
  unsigned l8_indvar;
  unsigned l8_indvar__PHI_TEMPORARY;
  int *l14_tmp_2E_5;
  int l7_tmp_2E_9;
  unsigned l8_indvar_2E_next;

  l8_indvar__PHI_TEMPORARY = 0u; /* for PHI node */

l13_no_exit:
  l8_indvar = l8_indvar__PHI_TEMPORARY;
  l14_tmp_2E_5 = &Array[l8_indvar];
  l7_tmp_2E_9 = *l14_tmp_2E_5;
  *l14_tmp_2E_5 = (l7_tmp_2E_9 + l7_X);
  l8_indvar_2E_next = l8_indvar + 1u;
  if (!(l8_indvar_2E_next == 1000u)) {
    l8_indvar__PHI_TEMPORARY = l8_indvar_2E_next; /* for PHI node */
    goto l13_no_exit;
  }
  return;
}

The new CBE generated code causes GCC to compile the function into:

_test:
        mflr r5
        bcl 20,31,"L00000000001$pb"
"L00000000001$pb":
        mflr r4
        mtlr r5
        addis r2,r4,ha16(_Array-"L00000000001$pb")
        li r4,1000
        mtctr r4
        la r9,lo16(_Array-"L00000000001$pb")(r2)
        li r2,0
L10:
L2:
        lwzx r7,r2,r9
        add r6,r7,r3
        stwx r6,r2,r9
        addi r2,r2,4
        bdnz L10
L7:
        blr

... which is exactly what we want. Patrick, I would appreciate it if you
could rerun your tests on PPC and let me know if this helps. :slight_smile:

-Chris

Sorry for the delayed response, we're having finals and stuff here so I've been pretty distracted studying for those. I will grab the latest from CVS tonight and rerun that and see what the difference is.

Patrick

Okay, I changed the C backend to emit syntactic loops around the real
loops, and it seems to make a big difference. LLVM now generates this
code (note that the actual loop is not actually responsible for control
flow, it's unreachable):

... which is exactly what we want. Patrick, I would appreciate it if you
could rerun your tests on PPC and let me know if this helps. :slight_smile:

-Chris

Ok, I ran through the SPEC suite with the new LLVM. Unfortunately, it doesn't seem to have made anything faster. Most of the benchmarks are the same (or within noise) but gzip got even slower. I put together a quick page showing the SPEC scores and their differences between LLVM versions. You can see that here: www.valtrain.com/files/LLVM-compare.html

Due to the aforementioned finals, I haven't gotten a chance to do any profiling or try to investigate why the syntactic loop thing made it slower but I'll try to look at that some more this weekend.

Patrick

Okay, I changed the C backend to emit syntactic loops around the real
loops, and it seems to make a big difference. LLVM now generates this
code (note that the actual loop is not actually responsible for control
flow, it's unreachable):

... which is exactly what we want. Patrick, I would appreciate it if you
could rerun your tests on PPC and let me know if this helps. :slight_smile:

Aside from this syntactic loop stuff, I was looking over gzip some more and found another area that could be improved.

In gzip's longest_match function, part of the code generated by CBE looks like this:

l13_shortcirc_next_2E_11:
   l8_chain_length_2E_039 = (((l2_tmp_2E_182) ? (4294967295u) : (0u))) + l8_chain_length_2E_1;

.. some other code ...

l13_loopcont_2E_0:
    ... some other code ...
   l2_tmp_2E_182 = l8_tmp_2E_180 > l8_mem_tmp_2E_0;

  if (l2_tmp_2E_182) {
     goto l13_shortcirc_next_2E_11;
   } else {
     goto l13_UnifiedReturnBlock;
   }

Basically it does that check and puts the result in l2_tmp_2E_182, then uses that in the if check and the ternary thing. When this is compiled, the assembly that it generates for that check/assignment is:

subc r29, r25, r2
subfe r29,r29,r29
neg r29,r29

This is pretty slow compared to just doing the check on the fly (and being able to just use a compare instruction). If I manually edit the code to change it to:

l13_shortcirc_next_2E_11:
   l8_chain_length_2E_039 = ((l8_tmp_2E_180 > l8_mem_tmp_2E_0) ? (4294967295u) : (0u))) + l8_chain_length_2E_1;

.. some other code ...

l13_loopcont_2E_0:

  if (l8_tmp_2E_180 > l8_mem_tmp_2E_0) {
     goto l13_shortcirc_next_2E_11;
   } else {
     goto l13_UnifiedReturnBlock;
   }

then the assembly generated becomes a cmplw and branch where it occurs.

Making this change in only this one spot causes the time to run to decrease 69 seconds, giving it a speedup of 6% from the 5/12 LLVM CVS. I noticed several spots in the CBE code where this type of code was generated, and if it was changed to emit code the 2nd way it would presumably help even more.

Lastly, did you ever hear anything back from that group that was working on the PPC JIT compiler? :slight_smile:

Thanks,

Patrick

Making this change in only this one spot causes the time to run to
decrease 69 seconds, giving it a speedup of 6% from the 5/12 LLVM CVS.
I noticed several spots in the CBE code where this type of code was
generated, and if it was changed to emit code the 2nd way it would
presumably help even more.

That is *HUGE*. I will definitely look at fixing this, and I'll let you
know when it's ready. Thanks for identifying this problem!

Lastly, did you ever hear anything back from that group that was
working on the PPC JIT compiler? :slight_smile:

No, I didn't. I'll poke them again, but at this point I'm not feeling too
hopeful.

-Chris

Making this change in only this one spot causes the time to run to
decrease 69 seconds, giving it a speedup of 6% from the 5/12 LLVM CVS.
I noticed several spots in the CBE code where this type of code was
generated, and if it was changed to emit code the 2nd way it would
presumably help even more.

btw, the CBE is doing this now. :slight_smile:

Lastly, did you ever hear anything back from that group that was
working on the PPC JIT compiler? :slight_smile:

No. Unfortunately, at this point, I think it is best to consider the work
lost. :frowning: If you (or anyone else) are/is interested in working on the
existing PPC skeleton in CVS, please feel free, and don't be shy about
asking questions. :slight_smile:

-Chris