Additional Optimization I'm Missing?

Hello, I’m working on using the LLVM JIT for a little project of mine, amazing work first off! I have a question about optimization passes. I initially have this function I’ve created, in python it looks like this:

OS_end = 50
OS_start = 0
OS_timestep = 1
birth_rate = .3
population = 30.0

for time in range(OS_start, OS_end, OS_timestep):
    births = birth_rate * population
    deaths = 0.1 * population
    population_NEXT = population + births - deaths

    #generally put print statements here

    #updating stocks
    population = population_NEXT

Then I can turn it into LLVM IR:

define void @simulate() {
entry:
	%time = alloca double		; <double*> [#uses=4]
	%population_next = alloca double		; <double*> [#uses=2]
	%population = alloca double		; <double*> [#uses=5]
	%net_change = alloca double		; <double*> [#uses=2]
	%deaths = alloca double		; <double*> [#uses=2]
	%births = alloca double		; <double*> [#uses=2]
	%birth_rate = alloca double		; <double*> [#uses=2]
	%OS_timestep = alloca double		; <double*> [#uses=2]
	%OS_start = alloca double		; <double*> [#uses=2]
	%OS_end = alloca double		; <double*> [#uses=2]
	store double 5.000000e+01, double* %OS_end
	store double 0.000000e+00, double* %OS_start
	store double 1.000000e+00, double* %OS_timestep
	store double 3.000000e-01, double* %birth_rate
	store double 3.000000e+01, double* %population
	%OS_start1 = load double* %OS_start		; <double> [#uses=1]
	store double %OS_start1, double* %time
	br label %forcond

forcond:		; preds = %forinc, %entry
	%time2 = load double* %time		; <double> [#uses=1]
	%OS_end3 = load double* %OS_end		; <double> [#uses=1]
	%forcond4 = fcmp olt double %time2, %OS_end3		; <i1> [#uses=1]
	br i1 %forcond4, label %forbody, label %forafter

forbody:		; preds = %forcond
	%birth_rate5 = load double* %birth_rate		; <double> [#uses=1]
	%population6 = load double* %population		; <double> [#uses=1]
	%multmp = mul double %birth_rate5, %population6		; <double> [#uses=1]
	store double %multmp, double* %births
	%population7 = load double* %population		; <double> [#uses=1]
	%multmp8 = mul double 1.000000e-01, %population7		; <double> [#uses=1]
	store double %multmp8, double* %deaths
	%births9 = load double* %births		; <double> [#uses=1]
	%deaths10 = load double* %deaths		; <double> [#uses=1]
	%subtmp = sub double %births9, %deaths10		; <double> [#uses=1]
	store double %subtmp, double* %net_change
	%population11 = load double* %population		; <double> [#uses=1]
	%net_change12 = load double* %net_change		; <double> [#uses=1]
	%addtmp = add double %population11, %net_change12		; <double> [#uses=1]
	store double %addtmp, double* %population_next
	br label %updatestocks

updatestocks:		; preds = %forbody
	%population_next13 = load double* %population_next		; <double> [#uses=1]
	store double %population_next13, double* %population
	br label %forinc

forinc:		; preds = %updatestocks
	%time14 = load double* %time		; <double> [#uses=1]
	%OS_timestep15 = load double* %OS_timestep		; <double> [#uses=1]
	%new_time = add double %time14, %OS_timestep15		; <double> [#uses=1]
	store double %new_time, double* %time
	br label %forcond

forafter:		; preds = %forcond
	ret void
}


And then I run the optimizations from the Kaleidoscope tutorial (mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg) and get the following:

define void @simulate() {
entry:
	br label %forcond

forcond:		; preds = %forbody, %entry
	%population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp, %forbody ]		; <double> [#uses=3]
	%time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, %forbody ]		; <double> [#uses=2]
	%forcond4 = fcmp olt double %time.0, 5.000000e+01		; <i1> [#uses=1]
	br i1 %forcond4, label %forbody, label %forafter

forbody:		; preds = %forcond
	%multmp = mul double %population.0, 3.000000e-01		; <double> [#uses=1]
	%multmp8 = mul double %population.0, 1.000000e-01		; <double> [#uses=1]
	%subtmp = sub double %multmp, %multmp8		; <double> [#uses=1]
	%addtmp = add double %population.0, %subtmp		; <double> [#uses=1]
	%new_time = add double %time.0, 1.000000e+00		; <double> [#uses=1]
	br label %forcond

forafter:		; preds = %forcond
	ret void
}

The thing is, in forbody it is still doing:
subtmp = population + (population * .3) - (population * .1)

ideally I would love to see it reduced to:
subtmp = 1.2 * population

I’ve played around with adding a bunch of optimizations that sound good from [http://www.llvm.org/docs/Passes.html], but I must admit I’m a bit over my head here. Does anyone have any suggestions? Am I missing passes, do I need to restructure the IR, or am I missing some concept?

thanks! (keep up the amazing work)
yours,
Bobby Powers

I expect that such a transformation is not safe in general thanks to
rounding errors, and as such isn't applied. AFAIK, compilers usually
don't reorganize floating point operations.

Have you tried it on other compilers? I expect it'll never happen
without some imprecise math option (like gcc's -ffast-math).

Hello, I’m working on using the LLVM JIT for a little project of mine, amazing work first off! I have a question about optimization passes. I initially have this function I’ve created, in python it looks like this:

OS_end = 50
OS_start = 0
OS_timestep = 1
birth_rate = .3
population = 30.0

for time in range(OS_start, OS_end, OS_timestep):
    births = birth_rate * population
    deaths = 0.1 * population
    population_NEXT = population + births - deaths


The thing is, in forbody it is still doing:
subtmp = population + (population * .3) - (population * .1)

ideally I would love to see it reduced to:
subtmp = 1.2 * population

This transformation changes the result due to roundoff errors, and would be incorrect.
In C using %lf the printed values of the 2 expressions diverge on the 84th iteration.

Other people have already addressed the specific case of this example, but, in general, the passes use by the opt tool in -std-compile-opts are considered to be a good set for unit-at-a-time optimization, and similarly the ones used in llvm-ld are considered good (in addition to the single-unit ones) for whole-program optimization.

Also note that predsimplify is incomplete and has some known bugs. I would recommend against using it.

–Owen

thanks for the help, I suppose I’m yet not use to thinking of rounding errors 84 iterations out :slight_smile:

yours,
Bobby Powers

thanks for the help, I suppose I'm yet not use to thinking of rounding
errors 84 iterations out :slight_smile:

llc has an -enable-unsafe-fp-math option, while gcc has -fast-math, for
people who don't care about such details.