Effect on NSW attribute on 'mul' during InstCombine pass ?

Hi all,

I'm using LLVM 3.0, for which I've filed following bug http://llvm.org/bugs/show_bug.cgi?id=12130.
I'm trying to solve this problem by myself digging into LLVM sources.
It seems that problem that I'm experiencing is related to presence or absence of NSW attribute on a 'mul'.
Considering following code:

define void @t2(double* %x) {
L.entry:
  %a = alloca [2 x i64], align 4
  %0 = bitcast [2 x i64]* %a to i64*
  store i64 3, i64* %0
  %1 = getelementptr [2 x i64]* %a, i32 0, i32 1
  store i64 5, i64* %1
  %2 = bitcast [2 x i64]* %a to double*
  %3 = bitcast double* %2 to i8*
  %4 = load i64* %0
  %5 = sub i64 %4, 2
  %6 = trunc i64 %5 to i32
  %7 = mul i32 %6, 8 ; HERE is problematic line #1
  %8 = getelementptr i8* %3, i32 %7
  %9 = bitcast i8* %8 to double*
  %10 = load double* %9
  %11 = bitcast double* %x to i8*
  %12 = getelementptr i8* %11, i32 8
  %13 = bitcast i8* %12 to double*
  store double %10, double* %13
  ret void
}

If I use opt has follows:

opt -instcombine trb.ll -S -o trb.opt.ll

I've got following code generated:

; ModuleID = 'trb.ll'

define void @t2(double* %x) {
L.entry:
  %a = alloca [2 x i64], align 4
  %0 = getelementptr inbounds [2 x i64]* %a, i32 0, i32 0
  store i64 3, i64* %0
  %1 = getelementptr [2 x i64]* %a, i32 0, i32 1
  store i64 5, i64* %1
  %2 = bitcast [2 x i64]* %a to i8*
  %3 = load i64* %0
  %4 = add i64 %3, 536870910 ; Problematic line #2
  %5 = trunc i64 %4 to i32
  %6 = shl i32 %5, 3
  %7 = getelementptr i8* %2, i32 %6
  %8 = bitcast i8* %7 to double*
  %9 = load double* %8
  %10 = bitcast double* %x to i8*
  %11 = getelementptr i8* %10, i32 8
  %12 = bitcast i8* %11 to double*
  store double %9, double* %12
  ret void
}

If I replace on problematic line #1 %7 = mul i32 %6, 8 by %7 = mul nsw i32 %6 then opt generates:

; ModuleID = 'trb.ll'

define void @t2(double* %x) {
L.entry:
  %a = alloca [2 x i64], align 4
  %0 = getelementptr inbounds [2 x i64]* %a, i32 0, i32 0
  store i64 3, i64* %0
  %1 = getelementptr [2 x i64]* %a, i32 0, i32 1
  store i64 5, i64* %1
  %2 = bitcast [2 x i64]* %a to i8*
  %3 = load i64* %0
  %4 = add i64 %3, 4294967294
  %5 = trunc i64 %4 to i32
  %6 = shl nsw i32 %5, 3
  %7 = getelementptr i8* %2, i32 %6
  %8 = bitcast i8* %7 to double*
  %9 = load double* %8
  %10 = bitcast double* %x to i8*
  %11 = getelementptr i8* %10, i32 8
  %12 = bitcast i8* %11 to double*
  store double %9, double* %12
  ret void
}

Digging into the source I understood that 'sub' is turned into an 'add' with 2-complemented value, 'mul' is turned into a shift and shit operation has been propagated to 2-comp constant to clear highest 3 bits when nsw is not present. To me this transformation seems invalid, can someone points me to where it occurs. Problem with such a transformation is that if I specify datalayout for target then in GVN it got further optimized into:

define void @t2(double* nocapture %x) nounwind {
L.entry:
  %a = alloca [2 x i64], align 8
  %0 = getelementptr inbounds [2 x i64]* %a, i32 0, i32 0
  store i64 3, i64* %0, align 8
  %1 = getelementptr [2 x i64]* %a, i32 0, i32 1
  store i64 5, i64* %1, align 8
  %2 = getelementptr [2 x i64]* %a, i32 0, i32 536870913
  %3 = bitcast i64* %2 to double*
  %4 = getelementptr double* %x, i32 1
  store double undef, double* %4, align 4
  ret void
}

Thus marking final store as 'undef' value which if not correct if pointer arithmetic is 32-bit since 536870913*8%2^32 = 8.

Thanks for your help
Best Regards
Seb