Which transform passes to apply?

Hello, I'm a new LLVM user working on a C++ EDSL for image processing. I have
a function, which after applying createInstructionCombiningPass() and
createDeadCodeEliminationPass() looks like:

define void @jitcv_sum_64sf1001(%Matrix* %src, %Matrix* %dst, i32 %len) {
entry:
  br label %loop_i

loop_i: ; preds = %loop_i_end,
%entry
  %i = phi i32 [ 0, %entry ], [ %increment_i, %loop_i_end ]
  %0 = getelementptr inbounds %Matrix* %dst, i32 0, i32 2
  %dst_columns = load i32* %0
  *%dst_yRem = urem i32 %i, %dst_columns
  %dst_y = urem i32 %i, %dst_columns
  %1 = sub i32 %i, %dst_y
  %2 = add i32 %1, %dst_yRem*
  %3 = getelementptr inbounds %Matrix* %src, i32 0, i32 0
  %4 = load i8** %3
  %src_data = bitcast i8* %4 to double*
  %5 = getelementptr double* %src_data, i32 %2
  %6 = load double* %5
  *%accumulate = fadd double %6, 0.000000e+00*
  %7 = getelementptr inbounds %Matrix* %dst, i32 0, i32 0
  %8 = load i8** %7
  %dst_data = bitcast i8* %8 to double*
  %9 = getelementptr double* %dst_data, i32 %i
  store double %accumulate, double* %9
  br label %loop_i_end

loop_i_end: ; preds = %loop_i
  %increment_i = add i32 %i, 1
  %loop_i_test = icmp eq i32 %increment_i, %len
  br i1 %loop_i_test, label %loop_i_exit, label %loop_i

loop_i_exit: ; preds = %loop_i_end
  ret void
}

My question is which optimization pass(es) are needed to simplify the
instructions in bold. I've tried running the same passes again and also
tried createInstructionSimplifierPass() with no luck.

Many Thanks,

Josh

Hi Josh,

Hello, I'm a new LLVM user working on a C++ EDSL for image processing. I have
a function, which after applying createInstructionCombiningPass() and
createDeadCodeEliminationPass() looks like:

...

   *%dst_yRem = urem i32 %i, %dst_columns
   %dst_y = urem i32 %i, %dst_columns
   %1 = sub i32 %i, %dst_y
   %2 = add i32 %1, %dst_yRem*

...

My question is which optimization pass(es) are needed to simplify the
instructions in bold. I've tried running the same passes again and also
tried createInstructionSimplifierPass() with no luck.

GVN or EarlyCSE.

Ciao, Duncan.

Thanks Duncan, GVN/EarlyCSE worked as suggested. Any pointers on how to
optimize out:
%accumulate = fadd double %6, 0.000000e+00

Using the 3.1 release and the C++ API, I can't figure out how
FPMathOperator, TargetOptions, nor IRBuilder::SetDefaultFPMathTag work. I
also don't see any floating point math transformation passes. I did see
IRBuilder::SetFastMathFlags, do I need to update to 3.2 and use this call?

Hi,

If you want LLVM to do this you'll definitely have to enable unsafe-math
flags since the transformation isn't strictly valid in IEEE floating point
(the only web ref I can find quickly is

). Duncan has been doing some work in implementing stuff in that area that I
haven't kept up with. However, it might be worth considering tracking those
empty operations you want to at your own DSL level, so that you've got more
control over when these fp-unsafe optimizations are applied. (I was doing
something with automatic differentiation -- which throws up lots of
"multiply by 1"s , "add 0"s, etc --- about a year ago and found this was the
way to go then, but as mentioned there's been some activity in the area I
haven't been following.)

Regards,
Dave

Hi Josh,

Thanks Duncan, GVN/EarlyCSE worked as suggested. Any pointers on how to
optimize out:
%accumulate = fadd double %6, 0.000000e+00

instcombine, however this transform is only correct if you are adding -0.0
so you won't get it without "fast math", and in 3.2 I think it will only be
done by codegen (llc). The situation should be better in 3.3 since a bunch
of fast-math support started going in to the IR level transforms.

Using the 3.1 release and the C++ API, I can't figure out how
FPMathOperator, TargetOptions, nor IRBuilder::SetDefaultFPMathTag work. I
also don't see any floating point math transformation passes. I did see
IRBuilder::SetFastMathFlags, do I need to update to 3.2 and use this call?

Please open a bug report since even in mainline this transform isn't done yet
(as fast-math support at the IR level only just got going).

Ciao, Duncan.

Duncan Sands wrote

Hi Josh,

instcombine, however this transform is only correct if you are adding -0.0
so you won't get it without "fast math", and in 3.2 I think it will only
be
done by codegen (llc). The situation should be better in 3.3 since a
bunch
of fast-math support started going in to the IR level transforms.

...

Please open a bug report since even in mainline this transform isn't done
yet
(as fast-math support at the IR level only just got going).

Ciao, Duncan.

Filed under http://llvm.org/bugs/show_bug.cgi?id=14513. The -0.0 trick did
the job in my case. Thanks again!