How to produce the same result of clang++ -Oz through opt -Oz

Dear developers,

I am trying to use ‘opt’ command with different Optlevels -O3 and -Oz to an IR file to reproduce the results of directly applying clang++ -O3 and -Oz to a source c++ file.

  1. Firstly I use the following commands to produce the result of opt -O3:

clang++ -O3 -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc

opt -O3 raytracer.bc -o tmp.bc

llc -O3 tmp.bc -o tmp.s

clang++ tmp.s -o tmp.out

  1. Then I use clang -O3 directly:

clang++ -O3 raytracer.cpp -o raytracer.out

  1. Finally I compare the two files tmp.o and raytracer.o:

diff tmp.out raytracer.out

We can find the two files are exactly the same.

However, things are different in -Oz level

  1. I use the following commands to produce the result of opt -Oz:

clang++ -Oz -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc

opt -Oz raytracer.bc -o tmp.bc

llc -filetype=obj tmp.bc -o tmp.o (there is no -Oz option for llc)

  1. Then I use clang -Oz directly:

clang++ -Oz -c raytracer.cpp -o raytracer.o

  1. Finally I compare the two files tmp.o and raytracer.o:

diff tmp.o raytracer.o

It shows ‘Binary files tmp.o and raytracer.o differ’

why ‘opt -Oz’ cannot produce the same result as ‘clang++ -Oz’ and how to solve it ? I use LLVM 10.0.1 and CentOS 7.6.

I’d very much appreciate if you could help me with this. Thank you.

Kind Regards,

Jiayu Zhao

Hi,

This is a known issue: clang -O3 is slightly different from opt -O3 and it is hard to reproduce exactly. It’d be great to refactor it all so that LLVM exposes a common way for frontend to run the exact same thing.

Hi,

I can reproduce the results of clang++ -O3 by opt -O3. But I cannot reproduce the results of clang++ -Oz by opt -Oz.

Just see the previous commands I used to produce the result of opt -O3.

In general you’ll find many cases where O3 does not reproduce either.

Have you tried llc -O2 in your case? Clang sets the backend optimization level that way for Oz/Os/O2: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/BackendUtil.cpp#L430

Thank you for your reply. The default optimization level of llc is -O2. What’s more, I found the executable file (.out) produced by clang directly and llvm opt will be the same in most cases, but the object file (.o) will be different, both for -O3 and -Oz.

Use the following commands to produce object files (.o) :

  1. Produce the object file of llvm opt -Oz

clang++ -Oz -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc

opt -Oz raytracer*.bc -o tmp.bc*

llc -filetype=obj tmp.bc -o tmp.o

2. Produce the object file of clang++ directly

clang++ -Oz -c raytracer.cpp -o raytracer.o

  1. compare

diff tmp.o raytracer.o

It always shows ‘Binary files tmp.o and raytracer.o differ’.

Use the following commands to produce executable files (.out) :

  1. Produce theexecutable file of llvm opt -Oz

clang++ -Oz -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc

opt -Oz raytracer.bc -o tmp.bc

llc tmp.bc -o tmp.s

clang++ tmp.s -o tmp.out

2. Produce theexecutable file of clang++ directly

clang++ -Oz raytracer.cpp -o raytracer.out

  1. compare

diff tmp.outraytracer.out

We can find the two files are the same in most cases. However, sometimes they are also different. I guess it is because clang has its own optimizations, and these optimizations are not always before the llvm optimizations. If we use ’ clang -Oz -Xclang -disable-llvm-optzns’ to disable llvm passes and then use opt to apply llvm passes, the order of all optimizations will be different.

Jiayu Zhao

Hello Jiayu,

could you detail your scenario a bit more - do you need to run opt + llc, or would this be acceptable:

(assuming non thinlto build)

  1. from your original clang call, e.g. clang , do clang -v . copy the command line (starts with -cc1)
  2. clang -cc1 <…> -fembed-bitcode=all

ok, now you can take the resulting .o, let’s call it ‘file.o’, and do this:

llvm-objcopy --dump-section=.llvmbc=/some/path/file.bc file.o /dev/null
llvm-objcopy --dump-section=.llvmcmd=/some/path/file.cmd file.o /dev/null

You get 2 files, /some/path/file.bc and /some/path/file.cmd. The .cmd file contains the command line flags separated by ‘\0’. The .bc file is the IR.

If you re-issue now clang with those options (in the .cmd file*) over the .bc, you should get a bit-identical file.o

Note that clang will skip over compiler directives like “-I” or -D or -U, or paths to pch files or modules (since it now picks up the compilation from IR). You can basically take the IR file and recompile it with the .cmd command line arguments on a different machine, away from the build directory structure where it came from.

  • you need to split the .cmd file ‘line’ by \0, of course.

Thank you for your reply. The default optimization level of llc is -O2. What’s more, I found the executable file (.out) produced by clang directly and llvm opt will be the same in most cases, but the object file (.o) will be different, both for -O3 and -Oz.

Use the following commands to produce object files (.o) :

  1. Produce the object file of llvm opt -Oz

clang++ -Oz -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc

opt -Oz raytracer*.bc -o tmp.bc*

llc -filetype=obj tmp.bc -o tmp.o

2. Produce the object file of clang++ directly

clang++ -Oz -c raytracer.cpp -o raytracer.o

  1. compare

diff tmp.o raytracer.o

It always shows ‘Binary files tmp.o and raytracer.o differ’.

Use the following commands to produce executable files (.out) :

  1. Produce theexecutable file of llvm opt -Oz

clang++ -Oz -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc

opt -Oz raytracer.bc -o tmp.bc

llc tmp.bc -o tmp.s

clang++ tmp.s -o tmp.out

2. Produce theexecutable file of clang++ directly

clang++ -Oz raytracer.cpp -o raytracer.out

  1. compare

diff tmp.outraytracer.out

We can find the two files are the same in most cases.

Is the only difference between the two method is that in the second case you’re using clang to invoke the assembler instead of letting llc do the assembly? If so this seems fishy that the second case would match but not the first one: assembling the file is very mechanical. I’m interested in seeing a reproducer for this.

However, sometimes they are also different. I guess it is because clang has its own optimizations,

Not really: clang does not have separate optimizations from LLVM.

and these optimizations are not always before the llvm optimizations. If we use ’ clang -Oz -Xclang -disable-llvm-optzns’ to disable llvm passes and then use opt to apply llvm passes, the order of all optimizations will be different.

It isn’t as much the order as it is how it is set up: passes take options in their constructor, and some analyses inserted in the pipeline have knowledge of the platform (like TargetLibraryInformation).

Q: Is the only difference between the two method is that in the second case you’re using clang to invoke the assembler instead of letting llc do the assembly?

clang++ -Oz -Xclang -disable-llvm-optzns -emit-llvm -c raytracer.cpp -o raytracer.bc
opt -Oz raytracer*.bc -o tmp.bc*

clang++ -Oz -emit-llvm -c raytracer.cpp -o raytracer.bc

The tmp.bc will be different with raytracer.bc, and in some cases it will lead to different code sizes.

Hi Mircea,

Thank you for your response. I need to run opt +my own pass +llc.

And I really want to know why llvm produces different object files (.o) and executable files (.out) compared to clang.

Have you tried using -print-after-all option (you can add it both to opt and llc, and I guess you can use “-mllvm -print-after-all” in the clang++ only case).

That way I guess you could compare the result to find out where it starts to differ.

It would be much easier to understand your problem if you for example could determine that the difference appear in opt, or llc, or even later. If the diff is in debug info, or the code, or a data section, or something else. Using print-after-all code perhaps help out in doing that.

Sometimes it could be helpful to also use -debug-pass=Executions (with legacy PM) or -debug-pass-manager (with new PM) to see a bit more about pass invocations, or to get some more anchors when comparing the print-after-all output (but as I think Mehdi hinted earlier, just because a pass is run it might have been configured slightly differently depending on other options etc). But it could help out identifying something obvious such as not running the same passes in your two scenarios.

Regards,

Björn