algebraic (de)optimizations form long chains of dependent operations


when I compile (clang -O3) and optimize (opt -O3) C-code like this:

    sum1 = val1 + val2;
    sum2 = val3 + val4;
    sum3 = val5 + val6;
    sum4 = val7 + val8;
    sum5 = sum1 + sum2;
    sum6 = sum3 + sum4;
    sum7 = sum5 + sum6;
    sum += sum7;

I get bitcode like this:

if.end152: ; preds = %if.then150, %if.else146, %if.end137
  %val8.0 = phi i32 [ -2048, %if.then150 ], [ %conv38, %if.else146 ], [ 2047, %if.end137 ]
  %conv154 = trunc i32 %val8.0 to i16
  store i16 %conv154, i16* %add.ptr36, align 2
  %add = add i32 %val1.0, %sum.1
  %add161 = add i32 %add, %val2.0
  %add170 = add i32 %add161, %val3.0
  %add167 = add i32 %add170, %val4.0
  %add173 = add i32 %add167, %val5.0
  %add164 = add i32 %add173, %val6.0
  %add176 = add i32 %add164, %val7.0
  %add179 = add i32 %add176, %val8.0
  %cmp181 = icmp eq i32 %rem, 56
  br i1 %cmp181, label %if.then183, label

The problem with this bitcode is that all additions form one long chain of dependent additions, rather than a tree of additions as in the original source code. For parallel architectures like VLIW architectures, this is detrimental for performance. Any idea how I can avoid this?


prof. Bjorn De Sutter
Computer Systems Lab
Ghent University