Another missed optimization opportunity?

I was suprised to find that some bitcode I'm generating isn't getting
optimized. Here, I'm doing the equivalent of "myarray[5]++" (on an
"extern int *myarray"), repeated three times:

     @myarray = external global i32*

     define void @update_array() #0 {
       %1 = load i32** @myarray, align 8
       %2 = getelementptr inbounds i32* %1, i64 5
       %3 = load i32* %2, align 4
       %4 = add nsw i32 %3, 1
       store i32 %4, i32* %2, align 4
       %5 = load i32** @myarray, align 8
       %6 = getelementptr inbounds i32* %5, i64 5
       %7 = load i32* %6, align 4
       %8 = add nsw i32 %7, 1
       store i32 %8, i32* %6, align 4
       %9 = load i32** @myarray, align 8
       %10 = getelementptr inbounds i32* %9, i64 5
       %11 = load i32* %10, align 4
       %12 = add nsw i32 %11, 1
       store i32 %12, i32* %10, align 4
       ret void
     }

Running "opt -std-compile-opts" or even "opt -O3" doesn't seem to
change the bitcode any. I had expected the three increments by 1 to
be collapsed into a single increment by 3:

     @myarray = external global i32*

     define void @update_array() #0 {
       %1 = load i32** @myarray, align 8
       %2 = load i32* %1, align 4
       %3 = add nsw i32 %2, 3
       store i32 %3, i32* %1, align 4
       ret void
     }

Even the (x86-64) code generator doesn't do any last-minute
optimizations:

     movq myarray(%rip), %rax
     incl 20(%rax)
     movq myarray(%rip), %rax
     incl 20(%rax)
     movq myarray(%rip), %rax
     incl 20(%rax)

This is with LLVM revision 180116.

Is there some semantic reason that the increments aren't allowed to be
combined, or is this a missed optimization opportunity in LLVM?

Thanks,
-- Scott

Hey Scott,

...

Is there some semantic reason that the increments aren't allowed to be
combined, or is this a missed optimization opportunity in LLVM?

I believe that the wildcard is the extern keyword.

Since the external symbol isn't resolved until link time, I suspect that it
would be a legal C program to do something like (maybe the language lawyers
know better though):

cat test.c

extern int x;

int kung( ) {
  return x;
}

cat foo.c

extern int kung();

volatile int x;

int main() {
  x = 0;
  return kung();
}

Ugly programming, but I see no way of the linker having enough information
to determine that the extern should be volatile and issue an warning/error.

One more reason to use explicit barriers instead of volatile. :wink:

-Cameron

6.7.3 Type qualifiers

5 If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined.

-Krzysztof

...

6.7.3 Type qualifiers

5 If an attempt is made to modify an object defined with a const-qualified
type through use of an lvalue with non-const-qualified type, the behavior
is undefined. If an attempt is made to refer to an object defined with a
volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined.

I stand corrected. Thanks, Krysztof.

-Cameron

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Scott Pakin
Subject: [LLVMdev] Another missed optimization opportunity?

I'm doing the equivalent of "myarray[5]++" (on an
"extern int *myarray"), repeated three times:

I had expected the three increments by 1 to
be collapsed into a single increment by 3:

Is there some semantic reason that the increments aren't allowed to be
combined, or is this a missed optimization opportunity in LLVM?

Is this a potential aliasing effect? Since myarray is defined as a pointer, not an array, it's theoretically possible that the address therein refers to the same memory location as the pointer itself.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

I was thinking along those lines, but I haven't been able to come up
with a specific instance of what could possibly be aliased.

Here are some additional observations that might be helpful in solving
this mystery:

   1) The increments are in fact coalesced when myarray is declared as
      an array instead of a pointer:

           @myarray = external global [10 x i32]

           define void @update_array() #0 {
             %1 = load i32* getelementptr inbounds ([10 x i32]* @myarray, i64 0, i64 5), align 4
             %2 = add nsw i32 %1, 3
             store i32 %2, i32* getelementptr inbounds ([10 x i32]* @myarray, i64 0, i64 5), align 4
             ret void
           }

   2) Both clang and gcc optimize the analogous C code into an
      increment by 3:

           extern int *myarray;

           void update_array (void)
           {
             myarray[5]++;
           }

      This would imply that the issue is not related to semantics.

   3) opt *does* optimize the code when myarray is declared as a
      constant pointer to variable data as opposed to a variable
      pointer to variable data:

           @myarray = external constant i32*

      This would imply that the issue *is* related to semantics. Sigh.

-- Scott

Hi Scott,

I was suprised to find that some bitcode I'm generating isn't getting
optimized. Here, I'm doing the equivalent of "myarray[5]++" (on an
"extern int *myarray"), repeated three times:

does your bitcode contain data layout information?

Ciao, Duncan.

The semantic reason is that the optimizer is required to assume that the i32 stores could be storing to the storage of myarray. LLVM IR does not permit optimizers to optimize based on the nominal types of memory objects or memory accesses.

This gets optimized in C, because the C compiler adds special TBAA metadata annotations to the loads and stores which say that the stores of “int” do not interfere with the loads of “pointer”. It also gets optimized if myarray is const, because the optimizer knows that const memory is not modified by stores. It also gets optimized if myarray is an actual array, because then the address of the array is constant, rather than being a value loaded from memory.

Dan

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Scott Pakin
Subject: Re: [LLVMdev] Another missed optimization opportunity?

> Is this a potential aliasing effect? Since myarray is defined as a
> pointer, not an array, it's theoretically possible that the address
> therein refers to the same memory location as the pointer itself.

I was thinking along those lines, but I haven't been able to come up
with a specific instance of what could possibly be aliased.

The myarray variable could be defined and initialized like this:

    int* myarray = ((int*)&myarray) - 5;

If the initialization occurs external to the module with the incrementation function, the optimizers must assume such silliness is possible in lieu of any information to the contrary, so aliasing can occur.

- Chuck

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

Yes:

     ; ModuleID = 'junk.c'
     target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
     target triple = "x86_64-unknown-linux-gnu"

     @myarray = external global i32*

     ; Function Attrs: nounwind uwtable
     define void @update_array() #0 {
       %1 = load i32** @myarray, align 8
       %2 = getelementptr inbounds i32* %1, i64 5
       %3 = load i32* %2, align 4
       %4 = add nsw i32 %3, 1
       store i32 %4, i32* %2, align 4
       %5 = load i32** @myarray, align 8
       %6 = getelementptr inbounds i32* %5, i64 5
       %7 = load i32* %6, align 4
       %8 = add nsw i32 %7, 1
       store i32 %8, i32* %6, align 4
       %9 = load i32** @myarray, align 8
       %10 = getelementptr inbounds i32* %9, i64 5
       %11 = load i32* %10, align 4
       %12 = add nsw i32 %11, 1
       store i32 %12, i32* %10, align 4
       ret void
     }

     attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }

-- Scott

Ah. Very helpful answer; thanks.

-- Scott

I see. Pretty evil stuff. But that does explain how three myarray[5]++'s
can potentially be different from one myarray[5]+=3.

Thanks for the example,
-- Scott