array initialization with memcpy vs memset

Hello,

I recently observed a curious interaction between the code generated by Clang and the optimizations that OPT is capable of performing, so I’d appreciate if anyone could shed some light into the rationale for this decision in code generation.

When we have code like:

int foo(int idx) {

int array[10] = {1,2,3,4,5,6};

return array[idx];

}

Clang generates a memset to 0, followed by 6 stores (pattern A). Abridged version:

define i32 @_Z3fooi(i32 %idx) #0 {

entry:

%array = alloca [10 x i32], align 16

store i32 %idx, i32* %idx.addr, align 4, !tbaa !2

%1 = bitcast [10 x i32]* %array to i8*

call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 40, i32 16, i1 false)

%2 = bitcast i8* %1 to [10 x i32]*

%3 = getelementptr [10 x i32], [10 x i32]* %2, i32 0, i32 0

store i32 1, i32* %3

[…]

Now, if we add one extra element to the array (notice that the size of the array is still 10), we get a memcpy from a constant internal global (pattern B). Abridged version:

int foo(int idx) {

int array[10] = {1,2,3,4,5,6, 7};

return array[idx];

}

@_ZZ3fooiE5array = private unnamed_addr constant [10 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 0, i32 0], align 16

define i32 @_Z3fooi(i32 %idx) #0 {

entry:

%idx.addr = alloca i32, align 4

%array = alloca [10 x i32], align 16

store i32 %idx, i32* %idx.addr, align 4

%0 = bitcast [10 x i32]* %array to i8*

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast ([10 x i32]* @_ZZ3fooiE5array to i8*), i64 40, i32 16, i1 false)

[…]

OPT with –O3 on pattern B is able to transform uses of the alloca with uses of the constant global. In particular, the InstructionCombining pass has code to pattern match an alloca followed by a memcpy followed by reads, transforming them into reads from the global constant array and removing the alloca altogether. On pattern A, on the other hand, OPT can’t do anything, and the code stays pretty much untouched. (note that adding a const to the array would make the whole issue go away, but for the purposes of this discussion, let’s ignore it)

The decision on which pattern to use is made by lib/CodeGen/CGDecl.cpp::shouldUseMemSetToInitialize but the comments there weren’t very enlightning.

Any insight on this, examples where pattern A is better or maybe suggestions for a different heuristic would be appreciated. This might be a case for improving OPT, but I’d like to hear from the Clang side too.

Recently, rC337887 added a few extra possibilities to this, by considering a memset to values != 0, but it doesn’t address the problem presented here.

Thanks!