llvm-ld optimization options

I have been wondering why llvm-ld generates the same code with or
without the option "-O5" so I looked at its source (llvm 2.2). And
apparently, the options "-On" are accepted but never used! The program
runs a fixed set of optimization passes, unless "-disable-opt" is
specified. What is the reason for this? If this is intended, then the
documentation should say something about this IMHO.

Regards,
Kevin André

without the option "-O5" so I looked at its source (llvm 2.2). And
apparently, the options "-On" are accepted but never used! The program
runs a fixed set of optimization passes, unless "-disable-opt" is
specified. What is the reason for this? If this is intended, then the
documentation should say something about this IMHO.

Probably the standard answer: "because noone has done it yet" :slight_smile:

-Chris

I don't want to sound too much like advertising our work, but COLE (see the CGO paper at http://www.elis.ugent.be/~kehoste) seems to be extremely well suited to figure out which passes should go where for llvm-ld... Same goes for llvm-gcc and opt btw :slight_smile:

I hope I can find the time sometime soon to work on this...

greetings,

Kenneth

Hi Kenneth,

>> I have been wondering why llvm-ld generates the same code with or
> without the option "-O5" so I looked at its source (llvm 2.2). And
> apparently, the options "-On" are accepted but never used! The program
> runs a fixed set of optimization passes, unless "-disable-opt" is
> specified. What is the reason for this? If this is intended, then the
> documentation should say something about this IMHO.
>
> Probably the standard answer: "because noone has done it yet" :slight_smile:

I don't want to sound too much like advertising our work, but COLE
(see the CGO paper at http://www.elis.ugent.be/~kehoste) seems to be
extremely well suited to figure out which passes should go where for
llvm-ld... Same goes for llvm-gcc and opt btw :slight_smile:

I hope I can find the time sometime soon to work on this...

this sounds great! You should concentrate on opt, since with that
you can schedule any passes you like (often multiple times) in any
order. There should be no difference between using llvm-gcc at some
-O level, and running it at -O0 and using opt to run the passes on the
unoptimized bitcode. Likewise for llvm-ld: all it does is link the
supplied bitcode into one monster bitcode module, run a bunch of passes
on it, then turn the result into object code. You get exactly the same
effect by linking the bitcode without optimization (using llvm-link),
using opt to run the list of passes on that, then turning the resulting
optimized bitcode into object code.

Where llvm-ld differs from llvm-gcc is in the kind of bitcode that
the optimizations need to be run on. The usual procedure is to
run llvm-gcc (with optimization) on each compilation unit, producing
bitcode. Then all the bitcode files for the program are passed to
llvm-ld for link-time optimization. So when running llvm-ld, passes
are run on a module where each function has already been maximally
optimized, as far as local optimizations go. The difference is that
now you have bodies for all functions so inter-procedural passes
(inline, argument promotion, ip constant propagation, elimination
of unused functions etc) become much more relevant and effective.
In short, llvm-gcc and llvm-ld represent two different classes of
bitcode being fed into opt, so presumably different sets of passes
are appropriate.

What would be great is to compute an optimal set of passes and -O
levels for llvm-gcc and llvm-ld. It makes sense to start with llvm-gcc.
Presumably what you would do is take a large body of code (spec etc),
and compile each compilation unit down to bitcode using llvm-gcc at
-O0. These bitcode files would be input to COLE. What COLE would
have to do is run opt on the bitcode with varying selections of passes,
and came up with optimal sets of passes based on various criteria
(execution speed, bitcode size [this one is the easiest to do, maybe
you should start here], etc), which can then be turned into -O levels
for llvm-gcc. I think you can assume that llvm-ld will be run at the
same -O level as llvm-gcc. So once you have llvm-gcc sorted, the next
step would be to compile all the benchmarks using llvm-gcc at -Owhatever,
and llvm-link the result into a mega-monster module. This results in
one module for each benchmark. Then COLE can schedule various passes on
the monster modules to see what is most effective for this link-time
optimization. If you used criteria X,Y and Z to determine a good set of
passes for -Owhatever, probably the same X,Y and Z should be used to
determine the llvm-ld passes at -Owhatever.

Ciao,

Duncan.

What would be great is to compute an optimal set of passes and -O
levels for llvm-gcc and llvm-ld. It makes sense to start with llvm-gcc.
Presumably what you would do is take a large body of code (spec etc),
and compile each compilation unit down to bitcode using llvm-gcc at
-O0. These bitcode files would be input to COLE. What COLE would
have to do is run opt on the bitcode with varying selections of passes,
and came up with optimal sets of passes based on various criteria
(execution speed, bitcode size [this one is the easiest to do, maybe
you should start here], etc), which can then be turned into -O levels
for llvm-gcc. I think you can assume that llvm-ld will be run at the

I would guess that an upcoming COLE optimizer should target the raw,
unoptimized bitcode output of any frontend, such as gcc, clang, or any
other languages, right?

There should be no difference between using llvm-gcc at some -O
level, and running it at -O0 and using opt to run the passes on
the unoptimized bitcode.

However, you wrote earlier:

Finally, llvm-gcc runs the following passes on each function
immediately after it is created:

CFGSimplification, PromoteMemoryToRegister,
ScalarReplAggregates, InstructionCombining.

AFAIK this isn't something that opt can do.

Or are these passes run always by llvm-gcc, even with -O0 ?

Hi Holger,

> There should be no difference between using llvm-gcc at some -O
> level, and running it at -O0 and using opt to run the passes on
> the unoptimized bitcode.

However, you wrote earlier:

> Finally, llvm-gcc runs the following passes on each function
> immediately after it is created:
>
> CFGSimplification, PromoteMemoryToRegister,
> ScalarReplAggregates, InstructionCombining.

AFAIK this isn't something that opt can do.

literally speaking opt can't run those on functions as they are
output, because it gets passed the final module. But since those
passes are per-function, running
  opt -simplifycfg -mem2reg -scalarrepl -instcombine
on the unoptimized module should run those passes on each function,
giving the same result. So llvm-gcc -O3 -funroll-loops should be
the same as:
  opt -simplifycfg -mem2reg -scalarrepl -instcombine -std-compile-opts
Only it's not going to be because for some strange reason opt likes to
put the extra arguments at the back, so in fact it will first run
-std-compile-opts and then -simplifycfg -mem2reg -scalarrepl -instcombine,
d'oh! Also, when run at -O1 or better, gcc itself does some folding and
other small optimizations, so the IR that gets fed to the LLVM passes is
actually different to what you get at -O0.

Ciao,

Duncan.