GCC vs. LLVM difference on simple code example

Hi,

I have a question on why gcc and llvm-gcc compile the following simple code snippet differently:

extern int a;

extern int *b;

void foo() {
int i;
for (i = 1; i < 100; ++i)
a += b[i];
}

gcc compiles this function hoisting the load of the global variable “b” outside of the loop, while llvm-gcc keeps it inside the loop. This results in slower code on the part of llvm-gcc, and I’m wondering why this choice is made? Is it because of the memory consistency model? With respect to memory consistency, does the C standard say whether a global variable used inside a function is loaded at the point of the use(s), or whether it can be loaded by the compiler earlier in the function? I had always thought that it was legal to hoist the load of a global variable outside of the loop as long as it was not declared volatile…

Here is the x86 assembly code generated by gcc 4.5.2. The load of “b” is highlighted:

.file “foo.c”
.text
.p2align 4,15
.globl foo
.type foo, @function
foo:
movl b, %ecx
movl $1, %eax
movl a, %edx
pushl %ebp
movl %esp, %ebp
.p2align 4,7
.p2align 3
.L2:
addl (%ecx,%eax,4), %edx
addl $1, %eax
cmpl $100, %eax
movl %edx, a
jne .L2
popl %ebp
ret
.size foo, .-foo
.ident “GCC: (GNU) 4.5.2”
.section .note.GNU-stack,"",@progbits

And here is the code produced by llvm-gcc 4.2.1:

.file “foo.c”
.text
.globl foo
.align 16, 0x90
.type foo,@function
foo:
pushl %ebp
movl %esp, %ebp
movl $1, %eax
movl a, %ecx
.align 16, 0x90
.LBB0_1:
movl b, %edx
addl (%edx,%eax,4), %ecx
movl %ecx, a
incl %eax
cmpl $100, %eax
jne .LBB0_1
popl %ebp
ret
.Ltmp0:
.size foo, .Ltmp0-foo
.section .note.GNU-stack,"",@progbits
.ident “GCC: (GNU) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build)”

Missed optimization, which appears to be fixed in newer versions.

-Eli

Hi,

I have a question on why gcc and llvm-gcc compile the following simple code snippet differently:

extern int a;

extern int *b;

void foo() {
int i;
for (i = 1; i < 100; ++i)
a += b[i];
}

gcc compiles this function hoisting the load of the global variable “b” outside of the loop, while llvm-gcc keeps it inside the loop. This results in slower code on the part of llvm-gcc, and I’m wondering why this choice is made? Is it because of the memory consistency model? With respect to memory consistency, does the C standard say whether a global variable used inside a function is loaded at the point of the use(s), or whether it can be loaded by the compiler earlier in the function? I had always thought that it was legal to hoist the load of a global variable outside of the loop as long as it was not declared volatile…

The difference here is that llvm-gcc doesn’t support the “-fstrict-aliasing” flag. If you pass -fno-strict-aliasing to gcc, you’ll probably get similar code to llvm-gcc. Note that clang does support -fstrict-aliasing.

-Chris