Missed optimization on array initialization

CAFxX · February 25, 2012, 11:17am

Prompted by a SO post () I checked and found that LLVM yields the same (seemingly) suboptimal code as MSVC. Consider the following, simplified, C snippet: Ideally, the array initialization should be sank after the return, but in Clang/LLVM 3.0 this doesn’t happen: and this gets emitted as (for x64, but x86 is similar): I don’t have ToT at hand, so I don’t know if this is still the case. Any idea why this might be happening?

Duncan_Sands · February 25, 2012, 12:17pm

Hi Carlo, for what it's worth, gcc-4.7 doesn't get this either.

Ciao, Duncan.

Chris_Lattner · February 25, 2012, 6:32pm

Prompted by a SO post () I checked and found that LLVM yields the same (seemingly) suboptimal code as MSVC. Consider the following, simplified, C snippet:

extern void bar(int*);

void foo(int a)
{
int ar[100] = {a};
if (a)
return;
bar(ar);
}

Ideally, the array initialization should be sank after the return, but in Clang/LLVM 3.0 this doesn’t happen:

This is a straight-forward form of code motion we don’t implement, which would be built on partially dead store analysis. Our dead store analysis in general isn’t very powerful, and cannot see across blocks. It turns out that it is pretty expensive and doesn’t often lead to big performance wins. That said, it is certainly an area that should be improved.

I’ll note that the original example from SO is more complex. Instead of a single store, it is a whole loop that initializes the array. Handling this case requires moving the entire loop, which requires fairly heroic compiler analysis. The saving grace is that that case is equivalent to a memcpy, so we may be able to handle that someday.

  %ar = alloca [100 x i32], align 16
  %1 = bitcast [100 x i32]* %ar to i8*
  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 400, i32 16, i1 false)
  %2 = getelementptr inbounds [100 x i32]* %ar, i64 0, i64 0
  store i32 %a, i32* %2, align 16, !tbaa !0

I’m surprised that we’re not shortening the memset to skip setting the dead element. That is something that we should be able to handle. Pete, didn’t you implement this a while ago?

-Chris

pete · February 25, 2012, 6:51pm

Prompted by a SO post () I checked and found that LLVM yields the same (seemingly) suboptimal code as MSVC. Consider the following, simplified, C snippet:

extern void bar(int*);

void foo(int a)
{
int ar[100] = {a};
if (a)
return;
bar(ar);
}

Ideally, the array initialization should be sank after the return, but in Clang/LLVM 3.0 this doesn’t happen:

This is a straight-forward form of code motion we don’t implement, which would be built on partially dead store analysis. Our dead store analysis in general isn’t very powerful, and cannot see across blocks. It turns out that it is pretty expensive and doesn’t often lead to big performance wins. That said, it is certainly an area that should be improved.

I’ll note that the original example from SO is more complex. Instead of a single store, it is a whole loop that initializes the array. Handling this case requires moving the entire loop, which requires fairly heroic compiler analysis. The saving grace is that that case is equivalent to a memcpy, so we may be able to handle that someday.
  %ar = alloca [100 x i32], align 16
  %1 = bitcast [100 x i32]* %ar to i8*
  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 400, i32 16, i1 false)
  %2 = getelementptr inbounds [100 x i32]* %ar, i64 0, i64 0
  store i32 %a, i32* %2, align 16, !tbaa !0
I’m surprised that we’re not shortening the memset to skip setting the dead element. That is something that we should be able to handle. Pete, didn’t you implement this a while ago?

Yeah. I think my implementation only trimmed stores to the end of the memset but this is the start. I’ll take a look at improving that. Will probably only want to shorten the start of the memset when it’s not going to shorten it to a horribly unaligned start position but that’s ok here.

Pete

Chris_Lattner · February 26, 2012, 2:28am

Makes perfect sense, thanks!

-Chris

Topic		Replies	Views
A 4x slower initialization loop in LLVM vs GCC and MSVC LLVM Dev List Archives	6	111	October 3, 2020
A question about GetElementPtr common subexpression elimination/loop invariant code motion LLVM Dev List Archives	5	77	January 30, 2007
Another missed optimization opportunity? LLVM Dev List Archives	11	79	April 24, 2013
Possible missed optimization? LLVM Dev List Archives	13	85	September 8, 2010
Missed optimization - spill/load generated instead of reg-to-reg move (and two other questions) LLVM Dev List Archives	2	92	March 1, 2018

Missed optimization on array initialization

Related Topics