llc -O# / opt -O# differences

Hey everyone,

I'm running stock LLVM 3.1 release. Both llc and opt programs have the
-O# arguments, however it looks like the results are somewhat
different. Here's a silly unoptimized bit of code which I'm generating

; ModuleID = 'foo'

%Coord = type { double, double, double }

define double @foo(%Coord*, %Coord*) nounwind uwtable ssp {
entry:
  %dx_ptr = alloca double
  %b_ptr = alloca %Coord*
  %a_ptr = alloca %Coord*
  store %Coord* %0, %Coord** %a_ptr
  store %Coord* %1, %Coord** %b_ptr
  %a = load %Coord** %a_ptr
  %addr = getelementptr %Coord* %a, i64 0
  %2 = getelementptr inbounds %Coord* %addr, i32 0, i32 0
  %3 = load double* %2
  %b = load %Coord** %b_ptr
  %addr1 = getelementptr %Coord* %b, i64 0
  %4 = getelementptr inbounds %Coord* %addr1, i32 0, i32 0
  %5 = load double* %4
  %sub = fsub double %3, %5
  store double %sub, double* %dx_ptr
  %dx = load double* %dx_ptr
  %dx2 = load double* %dx_ptr
  %mult = fmul double %dx, %dx2
  ret double %mult
}

This roughly matches the following C code

struct Coord { double x; double y; double z; };

double foo(struct Coord * a, struct Coord * b) {
    dx = a[0].x - a[0].y;
    return dx * dx;
}

Running through opt

$ llvm-as < x.ll | opt -O3 | llc > y.s

Produces the following:

_foo: ## @foo
    .cfi_startproc
## BB#0: ## %entry
    movsd (%rdi), %xmm0
    subsd (%rsi), %xmm0
    mulsd %xmm0, %xmm0
    ret
    .cfi_endproc

This also matches what clang compiles from the C function. However,
running through llc with the same optimization flag

$ llc -O3 x.ll -o z.s

_foo: ## @foo
    .cfi_startproc
## BB#0: ## %entry
    movq %rdi, -24(%rsp)
    movq %rsi, -16(%rsp)
    movq -24(%rsp), %rax
    movsd (%rax), %xmm0
    subsd (%rsi), %xmm0
    movsd %xmm0, -8(%rsp)
    mulsd %xmm0, %xmm0
    ret
    .cfi_endproc

This matches the results of LLVMCreateTargetMachine with
CodeGenLevelAggressive followed by LLVMTargetMachineEmitToFile which
I'm using.
.
Is the llc/opt difference expected? I'm a bit confused since I'd
expect same -O level running the same optimization passes. I have to
admit I'm not well versed in assembly but to me it looks like opt
produces something that eliminates a bunch of stack loading ops. I'd
appreciate any insight into this.

Thanks,

Dimitri

Dimitri Tcaciuc wrote:

Is the llc/opt difference expected?

Yes. "opt" runs the optimizers, which take LLVM IR as input and produce LLVM IR as output. "opt -O2 -debug-pass=Arguments" will show you a list of the individual optimizations (and analyses) that opt -O2 performs. It's possible to run them individually (opt -scalarrepl -instcombine) to create a list that's better for your own compiler, but -O2 has the ones we think are good for a C/C++ compiler.

"llc" is the code generator, which takes LLVM IR in and produces machine code. There are some places in the code generator where it has the choice between spending compile time to produce good code, or getting the code out quickly, and the -O flag to llc specifies that choice. For example, you can do register allocation by trying to figure out the most efficient registers that minimize the number of spills, or you can just pick the registers starting from one, and spill it if it's already used. Any optimizations llc does are things that can't possibly happen in an IR-to-IR pass (since the IR is SSA form, we can't do register allocation there).

If you want optimized code, you'd run the IR optimizers and ask the code generator to spend time producing good code. Or if you want unoptimized code, don't run any IR optimizers and ask the code generator to produce code as quickly as it can. You can of course choose some other combination by running opt and llc yourself, as you noticed.

Nick

Great, thanks for the info!

So to extrapolate, (referring to LLVM C bindings) running PassManager
+ populating PassManagerBuilder at it's own OptLevel actually takes
care of different category of optimizations and will not step on what
Target machine CodeGenLevel argument + TargetMachineEmitToFile are
working on?

Dimitri.