x86 cogen quality

Hi, I have a question about x86 code quality.

I have run a few benchmarks and compared the
running time of executables created by LLVM to
executables created by gcc.

It appears that code generated by LLVM is x1.5 - x3
times slower than code generated by gcc, for the x86

For some of the benchmarks the linear scan regalloc
works. When it does, results are in the x1.0 - 1.5
range. Unfortunately, the linear scan allocator breaks
on most of my code.

Question:
1) Do my observations fit your general experience ?

I haven't looked into the details of the generated
x86 code. I have the following observation, though:

When using gcc as a backend (compiling to the 'c' target
and then recompiling with gcc) results are generally a lot
better than just using the LLVM->x86 backend. This
indicates that the performance difference is mostly
located to the LLVM->x86 backend. Further, for those
of my codes where the new allocator works, results are
much better. Whether this is due to the allocator, or
some interaction between it and cogen, I do not know.

Currently, I am just playing with LLVM, but the longterm
plan is to build a new backend for a new machine. It won't
be register starved as the x86 is.

Question:
2) Is there a similar performance differential between
    LLVM->sparc and gcc on sparc, or are they much closer
    because the sparc has more registers and thus should
    be less dependent on good register allocation ?

3) What is the expected timeframe for the new regalloc to
    become stable ? .. or perhaps I should make a more general
    question: what is the perceived status in terms of performance
    for the two compiler backends and for the compiler backend
    part of the infrastructure ?

Finally
I think LLVM looks *very* nice and appears to be a substantial
contribution to the world of open source compiler infrastructure.

Best regards, and thanks in advance,
/Finn

Hi, I have a question about x86 code quality.

I have run a few benchmarks and compared the
running time of executables created by LLVM to
executables created by gcc.

It appears that code generated by LLVM is x1.5 - x3
times slower than code generated by gcc, for the x86

For some of the benchmarks the linear scan regalloc
works. When it does, results are in the x1.0 - 1.5
range. Unfortunately, the linear scan allocator breaks
on most of my code.

Question:
1) Do my observations fit your general experience ?

Yes, that does. I assume you are working with LLVM 1.2?

I haven't looked into the details of the generated
x86 code. I have the following observation, though:

When using gcc as a backend (compiling to the 'c' target
and then recompiling with gcc) results are generally a lot
better than just using the LLVM->x86 backend. This
indicates that the performance difference is mostly
located to the LLVM->x86 backend. Further, for those
of my codes where the new allocator works, results are
much better. Whether this is due to the allocator, or
some interaction between it and cogen, I do not know.

The LLVM 1.2 X86 code quality problems are due to a couple of serious
issues.

1. The default register allocator is a purely local algorithm, which
   cannot hold (e.g.) the counter of a loop in a register across the loop.
   This is *clearly* bad, and switching to the new allocator obviously
   makes a big difference :slight_smile:
2. Even with the new allocator, we are not able to globally allocate
   floating point registers (yet), do to some interaction with the X86
   floating point stack. This is just something that needs to be worked
   on, but unfortunately noone has had time to do the work recently.
3. When compiling with the native X86 backend, very little additional
   optimization is performed. When compiling with the C backend & GCC,
   GCC does it's own optimizations that can make a big difference. For
   example, LLVM 1.2 could only index into arrays with 64-bit integers
   (the getelementptr only accepted a 'long' operand). This could cause
   huge performance problems on the X86, which the GCC optimizer happily
   stomped out. (this issue has been fixed in LLVM CVS:
   http://llvm.cs.uiuc.edu/PR309)
4. in LLVM 1.2, several LLVM->LLVM optimizations were doing very obviously
   silly things, and have subsequently been fixed. See the "1.3" release
   notes for information: http://llvm.cs.uiuc.edu/docs/ReleaseNotes.html
5. One of our goals for LLVM 1.3 is to get one of the scalable pointer
   analyses that I have been working on turned on by default in the
   optimizing linker. This should have a pretty noticable performance
   impact.

Currently, I am just playing with LLVM, but the longterm
plan is to build a new backend for a new machine. It won't
be register starved as the x86 is.

Of the above, #1 would directly effect your target, #2 is X86 specific, #3
would have affected your target if it's 32-bit or smaller, #4 would have
hurt your target, and #5 will almost certainly help your target.

Question:
2) Is there a similar performance differential between
    LLVM->sparc and gcc on sparc, or are they much closer
    because the sparc has more registers and thus should
    be less dependent on good register allocation ?

I truly have no idea. I don't use the Sparc target very much, and I don't
know if anyone has looked into the actual performance of it. One of the
problems is that the LLVM Sparc backend doesn't share much code with the
target-independent code generator, so it's very hard to compare. Our
long-term goal is to merge the sparc code generator into the
target-independent code paths.

3) What is the expected timeframe for the new regalloc to
    become stable ?

I am hoping/planning for the new allocator to be in LLVM 1.3 as the
default allocator. From what I understand there is one bug left related
to spill code insertion, but Alkis has been very busy with other projects
(it's nearing the end of the semester already :). If he doesn't get to
it by 1.3, I will.

    .. or perhaps I should make a more general
    question: what is the perceived status in terms of performance
    for the two compiler backends and for the compiler backend
    part of the infrastructure ?

At this point we haven't actually spent a lot of time evaluating and
measuring code quality. In fact if you notice a piece of code that is not
being optimized or code generated well, please file a bug (with a
suggestion on what the code should have been compiled to). Generally we
separate optimizations in the catagories of LLVM->LLVM or codegen
optimizations, but both are important.

Finally I think LLVM looks *very* nice and appears to be a substantial
contribution to the world of open source compiler infrastructure.

Thanks! If you have any more questions, please feel free to ask.

-Chris

For some of the benchmarks the linear scan regalloc
works. When it does, results are in the x1.0 - 1.5
range. Unfortunately, the linear scan allocator breaks
on most of my code.

Is there a chance you can try cvs? I would be interested to
get a simplified test case where the allocator breaks. A lot of
improvements went into the x86 backend since 1.2 and we currently have
no test cases where the allocator breaks today.

Currently, I am just playing with LLVM, but the longterm
plan is to build a new backend for a new machine. It won't
be register starved as the x86 is.

It would be very interesting to see the performance difference between
linear scan and local allocators on a machine that is less spill
happy than the x86. In that case I expect to see much bigger difference
between the two.

3) What is the expected timeframe for the new regalloc to
   become stable ? .. or perhaps I should make a more general
   question: what is the perceived status in terms of performance
   for the two compiler backends and for the compiler backend
   part of the infrastructure ?

As Chris said, I have been held back from other projects. I hope that
right after finals I will have some time to fix the regression of the
linear scan register allocator. There are some improvements I have in
mind as well, so expect the linear scan register allocator to be much
better in 1.3.

Alkis Evlogimenos wrote:

For some of the benchmarks the linear scan regalloc
works. When it does, results are in the x1.0 - 1.5
range. Unfortunately, the linear scan allocator breaks
on most of my code.
   
Is there a chance you can try cvs? I would be interested to get a simplified test case where the allocator breaks. A lot of improvements went into the x86 backend since 1.2 and we currently have no test cases where the allocator breaks today.

I would, if I could.

However, it seems that there is a lot of changes since release 1.2.
The cvsweb interface only allow me to download one file at a time.
I have grabbed "llvm/lib/CodeGen/RegAllocLinearScan.cpp" and
run make and make install.

But the problem is still there. The error message says:
   lli: /home/finna/llvm/llvm/include/llvm/Target/MRegisterInfo.h:144:
   static bool llvm::MRegisterInfo::isPhysicalRegister(unsigned int):
   Assertion `Reg && "this is not a register!"' failed.

But trying cvsweb I cannot locate the file mentioned above. I guess
you have removed it which likely means there are many files I should update.
Bus cvsweb is not the right interface for that.

How do I proceed ?

Best regards
/Finn

You can check out the whole CVS tree at once, which is going to be a lot
easier than pulling it down from CVSweb :slight_smile: Here are the instructions:
http://llvm.cs.uiuc.edu/docs/GettingStarted.html#checkout

-Chris

Chris Lattner wrote:

You can check out the whole CVS tree at once, which is going to be a lot
easier than pulling it down from CVSweb :slight_smile: Here are the instructions:
http://llvm.cs.uiuc.edu/docs/GettingStarted.html#checkout

Ouch, how embarrasing - I looked for that place,
but apparently failed to notice it. Sorry.

Anyhow, I checked it out as described (in a clean directory), recompiled
and ... the problem is still there. More strange, it also fails to work with
the "simple" regalloc.

I will submit a bug, but first I must figure out how to use the bugpoint
utility.

Just to rule out one case of confusion:
1) I build the tools with no options, except for pointing out to
   configure, where the llvmgcc is.
2) I try to enable linear scan register allocation by supplying
    "-regalloc=linearscan" to "lli" - no other options are needed, right ?

The code triggering the bug is the "lame" benchmark in the "mibench"
benchmark suite, by the way.

best regards,
/Finn

Finn S Andersen wrote:

Chris Lattner wrote:

You can check out the whole CVS tree at once, which is going to be a lot
easier than pulling it down from CVSweb :slight_smile: Here are the instructions:
http://llvm.cs.uiuc.edu/docs/GettingStarted.html#checkout

Ouch, how embarrasing - I looked for that place,
but apparently failed to notice it. Sorry.

Anyhow, I checked it out as described (in a clean directory), recompiled
and ... the problem is still there. More strange, it also fails to work with
the "simple" regalloc.

I will submit a bug, but first I must figure out how to use the bugpoint
utility.

Just to rule out one case of confusion:
1) I build the tools with no options, except for pointing out to
  configure, where the llvmgcc is.
2) I try to enable linear scan register allocation by supplying
   "-regalloc=linearscan" to "lli" - no other options are needed, right ?

This sounds correct.

You may also want to consider trying static code generation with llc, which should take the same option. It will generate a native assembly file (.s file), which, in turn, you can compile and link with gcc (you could use the assembler directly, but using gcc is much easier as it will link in all necessary libraries).

-- John T.

Alkis Evlogimenos wrote:

Is there a chance you can try cvs? I would be interested to get a simplified test case where the allocator breaks. A lot of improvements went into the x86 backend since 1.2 and we currently have no test cases where the allocator breaks today.

I updated and recompiled and the error is still there. It turns out that I
cannot use the bugpoint utility to narrow down the error, because it
is not a miscompilation and it is not a compiler pass. It is a co-gen pass
and to provoke it I need to pass the regalloc=linearscan to llc or lli,
but the bugpoint utility does not support it.

I attach a small bytecode file that triggers the bug.

My apologies for trying to submit a bug through email to this list,
but there appear to be some problem with bugzilla. Although I have
opened an account, registered a password and confirmed it through
mail, I am still rejected by bugzilla when I try to log in.

I hope you can use the attached bc to narrow down the bug.
Thanks a lot for any help.

a.out.bc (1.07 KB)

Alkis Evlogimenos wrote:

>Is there a chance you can try cvs? I would be interested to
>get a simplified test case where the allocator breaks. A lot of
>improvements went into the x86 backend since 1.2 and we currently have
>no test cases where the allocator breaks today.

I updated and recompiled and the error is still there. It turns out that I
cannot use the bugpoint utility to narrow down the error, because it
is not a miscompilation and it is not a compiler pass. It is a co-gen pass
and to provoke it I need to pass the regalloc=linearscan to llc or lli,
but the bugpoint utility does not support it.

Ah yeah, that's PR#40.

I attach a small bytecode file that triggers the bug.

I can't reproduce this failure with mainline CVS using either lli or llc:

$ lli -regalloc=linearscan a.out.bc
$ echo $status
0

Are you sure that the CVS version is in your path?

My apologies for trying to submit a bug through email to this list,
but there appear to be some problem with bugzilla. Although I have
opened an account, registered a password and confirmed it through
mail, I am still rejected by bugzilla when I try to log in.

No problem. A second best choice to send bug reports is to the llvmbugs
mailing list. Please send Misha (Misha Brukman <brukman@cs.uiuc.edu>) and
I details about what went wrong with your bugzilla signup and we'll try to
fix the problem.

-Chris

Chris Lattner wrote:

I can't reproduce this failure with mainline CVS using either lli or llc:

$ lli -regalloc=linearscan a.out.bc
$ echo $status
0

Are you sure that the CVS version is in your path?

After configure and make I run make install, which moves the executables
to /usr/local/bin, right ? And yes, they are in my path.

But thank you very much for trying to reproduce the problem. It must be
some configuration problem on my side then. I do see other strange problems,
such as a long list of type conflicts when linking c++ programs. I will
clean up everything and install all over.

Best regards
/Finn

Finn S Andersen wrote:

Chris Lattner wrote:

I can't reproduce this failure with mainline CVS using either lli or llc:

$ lli -regalloc=linearscan a.out.bc
$ echo $status
0

Are you sure that the CVS version is in your path?

After configure and make I run make install, which moves the executables
to /usr/local/bin, right ? And yes, they are in my path.

I am not sure if the "make install" target will work as expected. Generally, we compile LLVM and set our paths to the OBJECT_ROOT/tools/Debug directory (as described in the LLVM "Getting Started" guide). I seem to recall that someone added some support for the install target, but I don't know if it is complete. If any work as been done, it has probably been registered in BugZilla as an enhancement.

But thank you very much for trying to reproduce the problem. It must be
some configuration problem on my side then. I do see other strange problems,
such as a long list of type conflicts when linking c++ programs. I will
clean up everything and install all over.

Best regards
/Finn

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev

Regards,

-- John T.

Sorry to disturb you all, but I simply cannot get
the linearscan allocator to work. I have upgraded
llvm to mainline cvs. Everything works until I get
to llc -regalloc=linearscan or lli -regalloc=linearscan.

I have installed it on redhat 9 and on Fedora Core
distributions (I even took it as far as to format a new
partition and install Fedora core all over). I have submitted
the bytecode that triggers the error, but IT WORKS
JUST FINE for John (see below). Apparently, the error
only shows itself on my installation.

Guess I must be making some stupid mistake with
my configuration. I do believe that I follow the recipe
given. Now I'm stuck. Any kind of even remote ideas
about what kind of strange dependencies I have violated ?
Anybody running it on RedHat Linux ? RedHat has
been known to ship installations with problems before,
like when they shipped it with a strange gcc version.

Any kind of help is most welcome.

Thanks in advance,
/Finn

John Criswell wrote:

Finn S Andersen wrote:

Sorry to disturb you all, but I simply cannot get
the linearscan allocator to work. I have upgraded
llvm to mainline cvs. Everything works until I get
to llc -regalloc=linearscan or lli -regalloc=linearscan.

I have installed it on redhat 9 and on Fedora Core
distributions (I even took it as far as to format a new
partition and install Fedora core all over). I have submitted
the bytecode that triggers the error, but IT WORKS
JUST FINE for John (see below). Apparently, the error
only shows itself on my installation.

Guess I must be making some stupid mistake with
my configuration. I do believe that I follow the recipe
given. Now I'm stuck. Any kind of even remote ideas
about what kind of strange dependencies I have violated ?
Anybody running it on RedHat Linux ? RedHat has
been known to ship installations with problems before,
like when they shipped it with a strange gcc version.

Any kind of help is most welcome.

We use RedHat in house here, both RedHat 7 and (I believe) 8. We'll be upgrading to RedHat 9 in the future. So, it should work.

I don't know exactly what to suggest, so I'll try to recap to see if there's anything we're overlooking.

Can you send us the following:

1) The LLVM assembly language (.ll file) that is failing.
2) The version of GCC that you are using (gcc -v)
3) A brief description of your hardware, especially if it is something uncommon. I'm assuming your running on x86. Running on another processor (like Sparc or even AMD's new 64 bit chips) might make a difference.
4) The exact command line that you use to configure LLVM, as well as the name of the source and object directory names you use to store the source and object files.
5) A copy of the Makefile.config file that is created in the object root directory after running configure.

Can you also do the following:

1) Verify that the input file works correctly for the other register allocators.

-- John T.

Finn S Andersen wrote:
> Sorry to disturb you all, but I simply cannot get
> the linearscan allocator to work. I have upgraded
> llvm to mainline cvs. Everything works until I get
> to llc -regalloc=linearscan or lli -regalloc=linearscan.

In addition to what John asked, can you say HOW it's failing? Does the
program crash? Does the register allocator crash (ie, does llc crash)?

Can you send the output of 'llc -o - foo.bc -debug -print-machineinstrs'?

Can you also do the following:

1) Verify that the input file works correctly for the other register
allocators.

No need to try all of them, just make sure the default allocator works.
If I remember right, you were having problems even with "hello world" type
of programs, so I don't think it's your code.

In any case, this is really distressing, and I would like to get this
resolved! :slight_smile:

-Chris

OK, details:

I run RH8 (gcc 3.2.something), RH9 (gcc 3.2.2-5) and Fedora.
Problems are the same across all setups. Hardware is Athlon 1600+
and half a giga RAM. Runs fail with an assertion when the linear
scan allocator is enabled, but runs without problems otherwise.

On RH9 (the system I have access to while generating this email) I
have the following details:

> 1) The LLVM assembly language (.ll file) that is failing.
- Attached.

a.out.ll (4.49 KB)

Makefile.config (6.48 KB)

linscan (32.4 KB)

a.out.bc (1.07 KB)

Dear Mr. Andersen:

Quick question:

Have you ever built the LLVM source code in /home/finna/llvm (i.e. the source directory)? If so, did you clean it out (make distclean) before switching over to /home/finna/build as the object tree?

-- John T.

Yes, that's exactly what I meant... thanks for reading my mind! :slight_smile:

It looks like this is where things start to go downhill
(LiveIntervals.cpp:559):

LiveIntervals::Interval::Interval(unsigned r)
    : reg(r),
      weight((MRegisterInfo::isPhysicalRegister(r) ?
              std::numeric_limits<float>::infinity() : 0.0F))
{

For a physical register (EDX in this case) it appears that the interval is
being created, but gets a 0 weight instead of an infinity weight. This
implies that the MRegisterInfo::isPhysicalRegister might have an issue, or
std::numeric_limits has an issue.

Could you try compiling and running this program:

Whoops, that should have parens:
  assert((r < 1024 || weight > 1) && "physreg or weight incorrectly computed!");

Sorry about that. :slight_smile:

-Chris

Chris Lattner wrote:

Could you try compiling and running this program:

---
#include <limits>
#include <iostream>
int main() {
std::cerr << std::numeric_limits<float>::infinity() << "\n";
}
---

Sure thing. It prints "0". Calling that inifinity is somewhat
of a stretch, isn't it ?

What on earth is going on here?

Log:

Chris Lattner wrote:

>Could you try compiling and running this program:
>
>---
>#include <limits>
>#include <iostream>
>int main() {
> std::cerr << std::numeric_limits<float>::infinity() << "\n";
>}
>---
>
>
Sure thing. It prints "0". Calling that inifinity is somewhat
of a stretch, isn't it ?

Well I think that explains why the linscan allocator doesn't work for you!
:slight_smile:

It looks like numeric_limits::infinity has had a line of problems:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5527
http://lists.suse.com/archive/suse-programming-e/2003-Mar/0004.html

What on earth is going on here?

I think that we should switch to C constants in this case. Can you try
#include <math.h> and use HUGE_VAL instead?

-Chris