Aliasing of volatile and non-volatile

A customer has reported a performance problem, which I have eventually tracked down to the following situation:

Consider this program:

int foo(int *p, volatile int *q, int n) {
   int i, s = 0;
   for (i = 0; i < n; ++i)
     s += *p + *q;
   return s;
}

LLVM's analysis indicates that *p and *q can alias, even though *p is non-volatile whereas *q is volatile. I don't have the exact section from the C standard, but if I remember correctly, accessing volatile memory via a non-volatile object results in an undefined behavior. This would suggest that volatiles and non-volatiles may be considered not to alias automatically, even if TBAA would not be able to prove it.

The LLVM's code (on x86) at -O2 looks like this:

         .text
         .globl foo
         .align 16, 0x90
         .type foo,@function
foo: # @foo
         .cfi_startproc
# BB#0: # %entry
         xorl %eax, %eax
         testl %edx, %edx
         jle .LBB0_2
         .align 16, 0x90
.LBB0_1: # %for.body
         addl (%rdi), %eax
         addl (%rsi), %eax
         decl %edx
         jne .LBB0_1
.LBB0_2: # %for.end
         ret
.Ltmp0:
         .size foo, .Ltmp0-foo
         .cfi_endproc
         .section ".note.GNU-stack","",@progbits

For comparison, GCC has only one load in the loop:

         .text
         .p2align 4,15
.globl foo
         .type foo, @function
foo:
.LFB0:
         .cfi_startproc
         xorl %eax, %eax
         testl %edx, %edx
         jle .L3
         movl (%rdi), %r8d
         xorl %ecx, %ecx
         .p2align 4,10
         .p2align 3
.L4:
         movl (%rsi), %edi
         addl $1, %ecx
         addl %r8d, %edi
         addl %edi, %eax
         cmpl %edx, %ecx
         jne .L4
.L3:
         rep
         ret
         .cfi_endproc
.LFE0:
         .size foo, .-foo
         .ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
         .section .note.GNU-stack,"",@progbits

The specifics in our case were that the AliasSetTracker indicated mod/ref for all pointers in the loop, where one of them was used in a volatile load, and all others were used in non-volatile loads. As a result, the loop had more loads than necessary.

Any thoughts? Has there been any consideration for this type of a situation?

-K

Accessing a variable declared as "volatile int" through a non-volatile
pointer isn't allowed. Accessing a variable declared as non-volatile "int"
through a volatile pointer is allowed.

You might be able to reason out that either p and q don't alias, or q
points to non-volatile memory... but that's substantially more complicated
than what you're proposing.

(The section in the standard is 6.5p7.)

-Eli

Furthermore, LLVM IR does not have the concept of "volatile variables". In
LLVM IR, operations are volatile, not storage. There are no semantic rules
prohibiting the mixing of volatile and non-volatile accesses.

Dan

Ok, that's not a problem. The problem here is the invariance of the non-volatile load. In the alias set tracker, the alias set that contains a volatile load is marked as "mod/ref". This then implies that any other load (volatile or not) from that set is not invariant in a given region. For that "variance" to happen that load would have to be accessing volatile memory, which isn't legal for non-volatile loads.

-K

Are you sure this is an alias problem?

What is happening is LLVM is leaving the code looking like this:

int foo(int *p, volatile int *q, int n) {
int i, s = 0;
for (i = 0; i < n; ++i)
s += *p + *q;
return s;
}

but GCC is changing to code to look like this:

int foo(int *p, volatile int *q, int n) {
int i, s = 0;

int t;

t = *p;

for (i = 0; i < n; ++i)
s += t + *q;
return s;
}

GCC is raising the *p out of the loop, recognizing the fact that memory access is more expensive than reg access.

What is preventing LLVM from doing the same?

It could have been even faster if it wished and changed the code to this:

int foo(int *p, volatile int *q, int n) {
int i, s = 0;

int t;

t = *p;

s += t * n;

for (i = 0; i < n; ++i)
s += *q;
return s;
}

Could the problem also be looked at like this:
1) If *p and *q are aliased, the results of *p is undefined.
2) If *p and *q and not aliased, the results of *p is defined.
Surely, in this sort of situation, optimizations should should be done
assuming (2), because if the case is (1) the result it undefined, so doing
(2) won't change that.

Kind Regards

James

Are you sure this is an alias problem?

In principle no, but in some sense yes---it's related to the way AliasSetTracker treats volatile references.

but GCC is changing to code to look like this:

int foo(int *p, volatile int *q, int n) {
   int i, s = 0;
   int t;
   t = *p;
   for (i = 0; i < n; ++i)
     s += t + *q;
   return s;
}

GCC is raising the *p out of the loop, recognizing the fact that memory
access is more expensive than reg access.
What is preventing LLVM from doing the same?

When a volatile load is added to an AliasSetTracker, the alias set associated with the load's memory location is marked as "mod/ref". This means that even if the alias set only contains loads (i.e. is "ref" but not "mod"), then after adding a volatile load it will become
"mod" as well. This will prevent any load from being moved out of a loop in LICM, since LICM uses AST to see if the load's location has been changed in the loop.

I have posted a patch on the llvm-commits mailing list:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130902/187072.html

-K