Effect of NSW attribute on 'mul' during InstCombine pass ?

Hi all,

I’m using LLVM 3.0, for which I’ve filed following bug http://llvm.org/bugs/show_bug.cgi?id=12130.
I’m trying to solve this problem by myself digging into LLVM sources.
It seems that problem that I’m experiencing is related to presence or absence of NSW attribute on a ‘mul’.
Considering following code:

define void @t2(double* %x) {
L.entry:
%a = alloca [2 x i64], align 4
%0 = bitcast [2 x i64]* %a to i64*
store i64 3, i64* %0
%1 = getelementptr [2 x i64]* %a, i32 0, i32 1
store i64 5, i64* %1
%2 = bitcast [2 x i64]* %a to double*
%3 = bitcast double* %2 to i8*
%4 = load i64* %0
%5 = sub i64 %4, 2
%6 = trunc i64 %5 to i32
%7 = mul i32 %6, 8 ; HERE is problematic line #1
%8 = getelementptr i8* %3, i32 %7
%9 = bitcast i8* %8 to double*
%10 = load double* %9
%11 = bitcast double* %x to i8*
%12 = getelementptr i8* %11, i32 8
%13 = bitcast i8* %12 to double*
store double %10, double* %13
ret void
}

If I use opt has follows:

opt -instcombine trb.ll -S -o trb.opt.ll

I’ve got following code generated:

; ModuleID = ‘trb.ll’

define void @t2(double* %x) {
L.entry:
%a = alloca [2 x i64], align 4
%0 = getelementptr inbounds [2 x i64]* %a, i32 0, i32 0
store i64 3, i64* %0
%1 = getelementptr [2 x i64]* %a, i32 0, i32 1
store i64 5, i64* %1
%2 = bitcast [2 x i64]* %a to i8*
%3 = load i64* %0
%4 = add i64 %3, 536870910 ; Problematic line #2
%5 = trunc i64 %4 to i32
%6 = shl i32 %5, 3
%7 = getelementptr i8* %2, i32 %6
%8 = bitcast i8* %7 to double*
%9 = load double* %8
%10 = bitcast double* %x to i8*
%11 = getelementptr i8* %10, i32 8
%12 = bitcast i8* %11 to double*
store double %9, double* %12
ret void
}

If I replace on problematic line #1 %7 = mul i32 %6, 8 by %7 = mul nsw i32 %6 then opt generates:

; ModuleID = ‘trb.ll’

define void @t2(double* %x) {
L.entry:
%a = alloca [2 x i64], align 4
%0 = getelementptr inbounds [2 x i64]* %a, i32 0, i32 0
store i64 3, i64* %0
%1 = getelementptr [2 x i64]* %a, i32 0, i32 1
store i64 5, i64* %1
%2 = bitcast [2 x i64]* %a to i8*
%3 = load i64* %0
%4 = add i64 %3, 4294967294
%5 = trunc i64 %4 to i32
%6 = shl nsw i32 %5, 3
%7 = getelementptr i8* %2, i32 %6
%8 = bitcast i8* %7 to double*
%9 = load double* %8
%10 = bitcast double* %x to i8*
%11 = getelementptr i8* %10, i32 8
%12 = bitcast i8* %11 to double*
store double %9, double* %12
ret void
}

Digging into the source I understood that ‘sub’ is turned into an ‘add’ with 2-complemented value, ‘mul’ is turned into a shift and shit operation has been propagated to 2-comp constant to clear highest 3 bits when nsw is not present. To me this transformation seems invalid, can someone points me to where it occurs. Problem with such a transformation is that if I specify datalayout for target then in GVN it got further optimized into:

define void @t2(double* nocapture %x) nounwind {
L.entry:
%a = alloca [2 x i64], align 8
%0 = getelementptr inbounds [2 x i64]* %a, i32 0, i32 0
store i64 3, i64* %0, align 8
%1 = getelementptr [2 x i64]* %a, i32 0, i32 1
store i64 5, i64* %1, align 8
%2 = getelementptr [2 x i64]* %a, i32 0, i32 536870913
%3 = bitcast i64* %2 to double*
%4 = getelementptr double* %x, i32 1
store double undef, double* %4, align 4
ret void
}

Thus marking final store as ‘undef’ value which if not correct if pointer arithmetic is 32-bit since 536870913*8%2^32 = 8.

Thanks for your help
Best Regards
Seb