qualitative comparison of correctness of llvm and gcc

Hi folks,

We recently generated some data that seemed interesting enough to share
here. This is a comparison between compilers that ignores the
performance of the generated code and focuses only on compiler correctness.

                   volatile checksum
                     errors errors

avr-gcc-3.4 1.879% 0.378%
avr-gcc-4.1 0.037% 0.256%
avr-gcc-4.2 0.753% 0.225%
gcc-3.4 1.222% 0.004%
gcc-4.0 0.041% 0.026%
gcc-4.1 0.196% 0.024%
gcc-4.2 0.770% 0.004%
gcc-4.3 0.727% 0.003%
llvm-gcc-2.2 18.712% 0.110%
llvm-gcc-pre-2.4 0.000% 0.006%

Notes:

These numbers come from throwing 100,000 random C programs at the various compilers. For each compiler, we compiled each test case at -O0, -O1, -O2, -Os, and -O3 and compared the results of these compilations. Random programs that crashed a compiler, and executables that failed to terminate in a timely fashion, did not count towards the 100,000.

Target is ia32 except for the top three compilers, which target AVR, an 8-bit MCU. Host is Ubuntu Feisty.

A "volatile error" indicates a case where a compiler failed to respect
the volatile invariant. The volatile invariant is simply that changing
the optimization level of a strictly conforming C program must not
change the number of dynamic loads or stores to any variable that is
volatile-qualified in the compiler's input. We check this with a hacked
valgrind (for x86) or a hacked simulator (for AVR).

A "checksum error" indicates a case where changing the optimization
level of a compiler changed the computation performed by its generated code in a way that is not correctness-preserving. The checksum is taken over the global variables in the random program just before it terminates.

We believe that our random programs are free of unspecified and
undefined behavior (e.g. divide by zero, pointer misuse, uninitialized
variable use, dependency on order of evaluation, etc.). Therefore,
every checksum or volatile error is the result of a real compiler bug.
We can't easily prove this. However, whenever we look closely at an apparent error, it becomes possible to legitimately blame the compiler.

I don't have handy the exact svn rev of the pre-2.4 llvm that we tested
but it's from a few weeks ago.

The improvement in llvm between 2.2 and now is no doubt the result of
many factors. I would argue that one of the most important factors was the llvm team's rapid fixing of 5 volatile bugs and 8 code generation bugs that we reported. We didn't test 2.3 because some bugs that we reported were fixed before that release, some after.

Current llvm is also far harder to crash than 2.2, although we
lack statistics on this.

Since the snapshot a few weeks ago, the llvm team has fixed a few more bugs. If this process continues it seems possible that within the foreseeable future, llvm will be effectively free of crashes, codegen errors, and volatile errors (for the class of random programs that we generate).

If anyone is curious about the kinds of bugs that are found through
random testing, a list follows.

Volatile:

   http://llvm.org/bugs/show_bug.cgi?id=2182
   http://llvm.org/bugs/show_bug.cgi?id=2234
   http://llvm.org/bugs/show_bug.cgi?id=2260
   http://llvm.org/bugs/show_bug.cgi?id=2262
   http://llvm.org/bugs/show_bug.cgi?id=2496

Codegen:

   http://llvm.org/bugs/show_bug.cgi?id=2271
   http://llvm.org/bugs/show_bug.cgi?id=2274
   http://llvm.org/bugs/show_bug.cgi?id=2276
   http://llvm.org/bugs/show_bug.cgi?id=2294
   http://llvm.org/bugs/show_bug.cgi?id=2364
   http://llvm.org/bugs/show_bug.cgi?id=2435
   http://llvm.org/bugs/show_bug.cgi?id=2479
   http://llvm.org/bugs/show_bug.cgi?id=2487
   http://llvm.org/bugs/show_bug.cgi?id=2568

Crash/hang:

   http://llvm.org/bugs/show_bug.cgi?id=2264
   http://llvm.org/bugs/show_bug.cgi?id=2323
   http://llvm.org/bugs/show_bug.cgi?id=2326
   http://llvm.org/bugs/show_bug.cgi?id=2234
   http://llvm.org/bugs/show_bug.cgi?id=2372
   http://llvm.org/bugs/show_bug.cgi?id=2422
   http://llvm.org/bugs/show_bug.cgi?id=2455
   http://llvm.org/bugs/show_bug.cgi?id=2497
   http://llvm.org/bugs/show_bug.cgi?id=2513
   http://llvm.org/bugs/show_bug.cgi?id=2570

John Regehr

Of course I meant "quantitative comparison."

John

John,

These numbers are terrific! Thank you for compiling them and for the bugs you filed. From the list below, it looks like there are only 2 outstanding bugs -- one in checksum and one non-terminating compilation. Very nice!

-bw

Hi John,

A "volatile error" indicates a case where a compiler failed to respect
the volatile invariant. The volatile invariant is simply that changing
the optimization level of a strictly conforming C program must not
change the number of dynamic loads or stores to any variable that is
volatile-qualified in the compiler's input. We check this with a hacked
valgrind (for x86) or a hacked simulator (for AVR).

does this also check that writes are atomic: that they are performed in
one processor operation? For example, I recently fixed a bug where a
write of a double constant on x86-32 (which can be written atomically)
was being turned into a write of an i64, which then got legalized into
two i32 writes.

Ciao,

Duncan.

Hi Duncan-

does this also check that writes are atomic: that they are performed in
one processor operation?

Can you elaborate a bit? I don't think volatile has any atomicity requirements. Of course I can make a struct, an int128_t, or whatever volatile (on AVR even an int16_t is updated non-atomically!).

Lack of atomicity is one of many problems with using volatile as a basis for creating correct systems code...

Thanks,

John

Hi John,

> does this also check that writes are atomic: that they are performed in
> one processor operation?

Can you elaborate a bit? I don't think volatile has any atomicity
requirements. Of course I can make a struct, an int128_t, or whatever
volatile (on AVR even an int16_t is updated non-atomically!).

that's not entirely true in practice: if I do a 32 bit volatile write
to a memory-mapped i/o location, then I'd be annoyed if that got turned
into four 8 bit volatile writes. And the hardware might react funny as
well! Also, because languages like C currently only have a concept of
a volatile write and no concept of an atomic write (as far as I know),
I suspect it is common practice to use a volatile write of a size that
the processor is known to be able to perform atomically, and expect it
to end up as that atomic write in the final assembler (eg: a write of
a double should not be turned into two i32 writes on x86-32). So I think
it is reasonable to expect LLVM to generate atomic reads and writes for
variables marked volatile if that can be done in a straightforward way
on the platform. So my question is: are you testing whether LLVM does
in fact do this? I appreciate that the C standard does not require it
(AFAIK).

Lack of atomicity is one of many problems with using volatile as a basis
for creating correct systems code...

The Ada language has a proper notion of atomic write, but the gcc internal
language does not (AFAIK), presumably due to the C-centric origins of gcc.
So Ada marks the write "volatile", and generates a compile-time error if
the write cannot reasonably be atomic on the target machine. For example,
it allows a double to be atomic (but not an i64) on x86-32.

Ciao,

Duncan.

Hi Duncan,

We currently check that every byte of a volatile is accessed the same number of times, and that this number doesn't change across optimization levels.

If LLVM wants to make stronger guarantees that's great, I suspect this is not hard to all to check. Can you provide a list of types that should be atomic for the x86 target? For example would we expect a struct of size 4 to be atomically accessed? How about pointers? Bitfields? Actually I think the interaction of volatile and bitfields is murky so maybe we don't want to go there.

Also, right now we don't check that the order in which volatiles are accessed is the same across all optimization levels. Fixing this is on my list. As long as we generate code with at most one volatile access between any given pair of sequence points (which we do) the compiler is not free to reorder.

John

Hi John,

We currently check that every byte of a volatile is accessed the same
number of times, and that this number doesn't change across optimization
levels.

If LLVM wants to make stronger guarantees that's great, I suspect this is
not hard to all to check. Can you provide a list of types that should be
atomic for the x86 target? For example would we expect a struct of size 4
to be atomically accessed? How about pointers? Bitfields? Actually I
think the interaction of volatile and bitfields is murky so maybe we don't
want to go there.

on x86-32, 8, 16 and 32 bit integer reads and writes should be atomic, as
should be float and double read and writes (most likely long double too,
but I don't know). Structs with one field should be read and written
atomically if the field is one of the above types [I'm talking about when
you read/write the entire struct; likewise for arrays with one element.
Note: the reads and writes need to be marked volatile, otherwise the compiler
feels no obligation to keep them atomic.

Best wishes,

Duncan.

Hello,

These numbers come from throwing 100,000 random C programs at the
various compilers. For each compiler, we compiled each test case at
-O0, -O1, -O2, -Os, and -O3 and compared the results of these
compilations. Random programs that crashed a compiler, and executables
that failed to terminate in a timely fashion, did not count towards the
100,000.

Maybe I missed it in the follow-up discussion (I had to work through over 500 mails after getting back from holiday, so I wasn't always as thorough as I should), but I was wondering which technique/tool you used to generate these C programs. Maybe I'm being naive, but it doesn't seem like an easy task to generate non-trivial C programs to feed into a compiler...

Also, where would I be able to find more details on this experiment? That is, is there any paper or technical report on this? Or something else with more details on the setup of this?

greetings,

Kenneth Hoste
Phd student @ Ghent University, Belgium