I took a look at the source code of the clang tool and the opt tool, they both seem to use the PassManagerBuilder::populateModulePassManager() and PassManagerBuilder::populateFunctionPassManager() functions to add passes to their optimization pipeline; and for the backend, the clang and llc both use the addPassesToEmitFile() function to generate object code.
So presumably the above two approaches to generating optimized executable file should do the same thing. However, I am seeing that the second approach is around 2% slower than the first approach (which is the way developers usually use) pretty consistently.
Can anyone point me to the reasons why this happens? Or even correct my wrong understanding of the relationship between these two approaches?
PS: I used the -debug-pass=Structure option to print out the passes, they seem the same except that the first approach has an extra pass called “-add-discriminator”, but I don’t think that’s the reason.
clang -O0 does not disable all optimization passes modify the IR.; In fact it causes most functions to get tagged with noinline to prevent inlinining
It also disable lifetime instrinsics emission and TBAA, etc.
What you really need to do is
clang -O3 -c emit-llvm -o source.bc -v
Find the -cc1 command line from that output. Execute that command with --disable-llvm-passes. leave the -O3 and everything else.
That’s a bit complicated: CC1 options can be passed through with -Xclang, for example here just adding to the regular clang invocation -Xclang -disable-llvm-passes
@Toddy, I think I had some misunderstanding about the Clang command line options when I posted the question.
I think pipeline 1 and 3 are supposed to have only trivial difference, while pipeline 2 is supposed to be much slower than the other two because the “-O0” option in pipeline 2 can disable some of the important passes in opt (even if you use “-O3” with opt).
I tried to check the IRs generated by pipeline 2 and 3 and saw that they are not the same (e.g., pipeline 3 emits IR with more alias info that can be used in opt). And what I did was exactly pipeline 2 (mistakenly thinking it would be equivalent to pipeline 1). So from my understanding, if you want to use the clang-opt-llc pipeline, you may need to stick with pipeline 3, where the “-O3 -Xclang -disable-llvm-passes” options tell clang to generate unoptimized IR that can be later fully optimized as in “clang -O3” directly.
clang -O0 adds a "optnone" attribute to each function that causes most
optimization passes to skip that function. Avoid with "-Xclang
-disable-O0-optnone".
If you pass -O0 to clang, most functions will be tagged with an optnone function attribute that will prevent opt and llc even if you pass -O3 to opt and llc. This is the mostly likely cause for the slow down in 2.
You can disable the optnone function attribute behavior by passing “-Xclang -disable-O0-optnone” to clang
@Craig and @Michael, for clang 4.0.1, -Xclang -disable-O0-optnone gives the following error message. From which version -disable-O0-optnone gets supported?
O0 didn’t start applying optnone until r304127 in May 2017 which is after the 4.0 family was branched. So only 5.0, 6.0, and trunk have that behavior. Commit message copied below
I’m actually confused by clang optimization flags.
If I run clang -help, it will show many optimizations (denoted as set A) and non-optimization options (denoted as set B).
If I run llvm-as < /dev/null | opt -O0/1/2/3 -disable-output -debug-pass=Arguments, it also shows many optimization flags (denote as set C).
There are many options in set C while not in set A, and also options in set A but not in set C.
The general question is: what is the relationship between set A and set C, at the same optimization level O0/O1/O2/O3?
Another question is: how to specify an option in set C as a clang command line option, if it is not in A?
For example, -dse is in set C but not in set A, how can I specify it as a clang option? Or simply I cannot do that.
I don’t think “clang -help” prints options about optimizations. Clang itself doesn’t have direct support for fine grained optimization control. Just the flag for levels -O0/-O1/-O2/-O3. This is intended to be simple and sufficient interface for most users who just want to compile their code. So I don’t think there’s a way to pass just -dse to clang.
opt on the other hand is more of a utility for developers of llvm that provides fine grained control of optimizations for testing purposes.
What I am trying is to compile a program with different sets of optimization flags.
If there is no fine-grained control over clang optimization flags, it would be impossible to achieve what I intend.
Although there is fine-grained control via opt, for a large-scale projects, clang-opt-llc pipeline may not be a drop-in solution.
I have no idea about the root cause.
Maybe due to debug symbols. But, I already use -DCMAKE_BUILD_TYPE=Release.
Anyway, I believe there is a bug somewhere.
Why are you using build directions from “flang” which is a fortran compiler and maintained by different people than the LLVM/clang community? But then compiling C/C++ code? Their bug database should be used for filing bugs against the fortran compiler not a C/C++ compiler issue.