I am trying to use ‘opt’ command with different Optlevels -O3 and -Oz to an IR file to reproduce the results of directly applying clang++ -O3 and -Oz to a source c++ file.
Firstly I use the following commands to produce the result of opt -O3:
This is a known issue: clang -O3 is slightly different from opt -O3 and it is hard to reproduce exactly. It’d be great to refactor it all so that LLVM exposes a common way for frontend to run the exact same thing.
Thank you for your reply. The default optimization level of llc is -O2. What’s more, I found the executable file (.out) produced by clang directly and llvm opt will be the same in most cases, but the object file (.o) will be different, both for -O3 and -Oz.
Use the following commands to produce object files (.o) :
We can find the two files are the same in most cases. However, sometimes they are also different. I guess it is because clang has its own optimizations, and these optimizations are not always before the llvm optimizations. If we use ’ clang -Oz -Xclang -disable-llvm-optzns’ to disable llvm passes and then use opt to apply llvm passes, the order of all optimizations will be different.
You get 2 files, /some/path/file.bc and /some/path/file.cmd. The .cmd file contains the command line flags separated by ‘\0’. The .bc file is the IR.
If you re-issue now clang with those options (in the .cmd file*) over the .bc, you should get a bit-identical file.o
Note that clang will skip over compiler directives like “-I” or -D or -U, or paths to pch files or modules (since it now picks up the compilation from IR). You can basically take the IR file and recompile it with the .cmd command line arguments on a different machine, away from the build directory structure where it came from.
you need to split the .cmd file ‘line’ by \0, of course.
Thank you for your reply. The default optimization level of llc is -O2. What’s more, I found the executable file (.out) produced by clang directly and llvm opt will be the same in most cases, but the object file (.o) will be different, both for -O3 and -Oz.
Use the following commands to produce object files (.o) :
We can find the two files are the same in most cases.
Is the only difference between the two method is that in the second case you’re using clang to invoke the assembler instead of letting llc do the assembly? If so this seems fishy that the second case would match but not the first one: assembling the file is very mechanical. I’m interested in seeing a reproducer for this.
However, sometimes they are also different. I guess it is because clang has its own optimizations,
Not really: clang does not have separate optimizations from LLVM.
and these optimizations are not always before the llvm optimizations. If we use ’ clang -Oz -Xclang -disable-llvm-optzns’ to disable llvm passes and then use opt to apply llvm passes, the order of all optimizations will be different.
It isn’t as much the order as it is how it is set up: passes take options in their constructor, and some analyses inserted in the pipeline have knowledge of the platform (like TargetLibraryInformation).
Q: Is the only difference between the two method is that in the second case you’re using clang to invoke the assembler instead of letting llc do the assembly?
Have you tried using -print-after-all option (you can add it both to opt and llc, and I guess you can use “-mllvm -print-after-all” in the clang++ only case).
That way I guess you could compare the result to find out where it starts to differ.
It would be much easier to understand your problem if you for example could determine that the difference appear in opt, or llc, or even later. If the diff is in debug info, or the code, or a data section, or something else. Using print-after-all code perhaps help out in doing that.
Sometimes it could be helpful to also use -debug-pass=Executions (with legacy PM) or -debug-pass-manager (with new PM) to see a bit more about pass invocations, or to get some more anchors when comparing the print-after-all output (but as I think Mehdi hinted earlier, just because a pass is run it might have been configured slightly differently depending on other options etc). But it could help out identifying something obvious such as not running the same passes in your two scenarios.