Relationship between clang, opt and llc

Hi folks,

I am wondering about the relationship clang, opt and llc. I understand that this has been asked, e.g., http://stackoverflow.com/questions/40350990/relationship-between-clang-opt-llc-and-llvm-linker. Sorry for posting a similar question again, but I still have something that hasn’t been resolved yet.

More specifically I am wondering about the following two approaches compiling optimized executable:

  1. clang -O3 -c source.c -o source.o

    clang a.o b.o c.o … -o executable

  2. clang -O0 -c -emit-llvm -o source.bc
    opt -O3 source.bc -o source.bc
    llc -O3 -filetype=obj source.bc -o source.o

    clang a.o b.o c.o … -o executable

I took a look at the source code of the clang tool and the opt tool, they both seem to use the PassManagerBuilder::populateModulePassManager() and PassManagerBuilder::populateFunctionPassManager() functions to add passes to their optimization pipeline; and for the backend, the clang and llc both use the addPassesToEmitFile() function to generate object code.

So presumably the above two approaches to generating optimized executable file should do the same thing. However, I am seeing that the second approach is around 2% slower than the first approach (which is the way developers usually use) pretty consistently.

Can anyone point me to the reasons why this happens? Or even correct my wrong understanding of the relationship between these two approaches?

PS: I used the -debug-pass=Structure option to print out the passes, they seem the same except that the first approach has an extra pass called “-add-discriminator”, but I don’t think that’s the reason.

Peizhao

clang -O0 does not disable all optimization passes modify the IR.; In fact it causes most functions to get tagged with noinline to prevent inlinining

What you really need to do is

clang -O3 -c emit-llvm -o source.bc -v

Find the -cc1 command line from that output. Execute that command with --disable-llvm-passes. leave the -O3 and everything else.

You should be able to feed the output from that command to opt/llc and get consistent results.

clang -O0 does not disable all optimization passes modify the IR.; In fact it causes most functions to get tagged with noinline to prevent inlinining

It also disable lifetime instrinsics emission and TBAA, etc.

What you really need to do is

clang -O3 -c emit-llvm -o source.bc -v

Find the -cc1 command line from that output. Execute that command with --disable-llvm-passes. leave the -O3 and everything else.

That’s a bit complicated: CC1 options can be passed through with -Xclang, for example here just adding to the regular clang invocation -Xclang -disable-llvm-passes

Best,

Thanks, Craig! That totally corrects my wrong understanding about the front end. I tried it and indeed now I can see almost the same results.

Best,
Peizhao

It’s really nice of you pointing out the -Xclang option, it makes things much easier. I really appreciate your help!

Best,
Peizhao

I tried the following on LULESH1.0 serial version (https://codesign.llnl.gov/lulesh/LULESH.cc)

  1. clang++ -O3 LULESH.cc; ./a.out 20
    Runtime: 9.487353 second

  2. clang++ -O0 -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20
    Runtime: 24.15 seconds

  3. clang++ -O3 -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20
    Runtime: 9.53 seconds

1 and 3 have almost the same performance, while 2 is significantly worse, while I expect 1, 2 ,3 should have trivial difference.

Is this a wrong expectation?

@Peizhao, what did you try in your last post?

@Toddy, I think I had some misunderstanding about the Clang command line options when I posted the question.

I think pipeline 1 and 3 are supposed to have only trivial difference, while pipeline 2 is supposed to be much slower than the other two because the “-O0” option in pipeline 2 can disable some of the important passes in opt (even if you use “-O3” with opt).

I tried to check the IRs generated by pipeline 2 and 3 and saw that they are not the same (e.g., pipeline 3 emits IR with more alias info that can be used in opt). And what I did was exactly pipeline 2 (mistakenly thinking it would be equivalent to pipeline 1). So from my understanding, if you want to use the clang-opt-llc pipeline, you may need to stick with pipeline 3, where the “-O3 -Xclang -disable-llvm-passes” options tell clang to generate unoptimized IR that can be later fully optimized as in “clang -O3” directly.

clang -O0 adds a "optnone" attribute to each function that causes most
optimization passes to skip that function. Avoid with "-Xclang
-disable-O0-optnone".

Michael

If you pass -O0 to clang, most functions will be tagged with an optnone function attribute that will prevent opt and llc even if you pass -O3 to opt and llc. This is the mostly likely cause for the slow down in 2.

You can disable the optnone function attribute behavior by passing “-Xclang -disable-O0-optnone” to clang

@Zhaopei, thanks for the clarification.

@Craig and @Michael, for clang 4.0.1, -Xclang -disable-O0-optnone gives the following error message. From which version -disable-O0-optnone gets supported?

[twang15@c89 temp]$ clang++ -O0 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc
error: unknown argument: ‘-disable-O0-optnone’

[twang15@c89 temp]$ clang++ --version
clang version 4.0.1 (tags/RELEASE_401/final)
Target: x86_64-unknown-linux-gnu

O0 didn’t start applying optnone until r304127 in May 2017 which is after the 4.0 family was branched. So only 5.0, 6.0, and trunk have that behavior. Commit message copied below

Author: Mehdi Amini <joker.eph@gmail.com>

IRGen: Add optnone attribute on function during O0

Amongst other, this will help LTO to correctly handle/honor files

compiled with O0, helping debugging failures.

It also seems in line with how we handle other options, like how

-fnoinline adds the appropriate attribute as well.

Differential Revision: https://reviews.llvm.org/D28404

Craig, thanks a lot!

I’m actually confused by clang optimization flags.

If I run clang -help, it will show many optimizations (denoted as set A) and non-optimization options (denoted as set B).
If I run llvm-as < /dev/null | opt -O0/1/2/3 -disable-output -debug-pass=Arguments, it also shows many optimization flags (denote as set C).

There are many options in set C while not in set A, and also options in set A but not in set C.

The general question is: what is the relationship between set A and set C, at the same optimization level O0/O1/O2/O3?
Another question is: how to specify an option in set C as a clang command line option, if it is not in A?

For example, -dse is in set C but not in set A, how can I specify it as a clang option? Or simply I cannot do that.

I don’t think “clang -help” prints options about optimizations. Clang itself doesn’t have direct support for fine grained optimization control. Just the flag for levels -O0/-O1/-O2/-O3. This is intended to be simple and sufficient interface for most users who just want to compile their code. So I don’t think there’s a way to pass just -dse to clang.

opt on the other hand is more of a utility for developers of llvm that provides fine grained control of optimizations for testing purposes.

After build LLVM5.0, I found that clang-5.0 is extremely slow.
Even it is built with -DCMAKE_BUILD_TYPE=Release

For building LULESH.cc, it gets stucked at linkage stage.

I build it as instructed from here https://github.com/flang-compiler/flang

Maybe I should submit a bug.

[twang15@c92 temp]$ time clang++ -v -O3 LULESH.cc
clang version 5.0.1 (https://github.com/flang-compiler/clang.git 64043d5cec9fb02d1b0fd80c9f2c4e9e4f09cf8f) (https://github.com/llvm-mirror/llvm.git 1368f4044e62cad4316da638d919a93fd3ac3fe6)
Target: x86_64-unknown-linux-gnu

real 2m21.979s
user 2m21.842s
sys 0m0.081s

What I am trying is to compile a program with different sets of optimization flags.
If there is no fine-grained control over clang optimization flags, it would be impossible to achieve what I intend.

Although there is fine-grained control via opt, for a large-scale projects, clang-opt-llc pipeline may not be a drop-in solution.

@Craig and @Michael

After installing clang-5.0 (download from http://releases.llvm.org, does not have Flang build’s slowdown mention above),

  1. clang++ -O0 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20

runtime: 2.354069e+01

  1. clang++ -O1 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20
    runtime: 9.046271e+00

  2. clang++ -O3 LULESH.cc
    runtime: 9.118835e+00

  3. clang++ -O2 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20
    runtime: 9.091278e+00

  4. clang++ -O3 -Xclang -disable-O0-optnone -Xclang -disable-llvm-passes -c -emit-llvm -o a.bc LULESH.cc; opt -O3 a.bc -o b.bc; llc -O3 -filetype=obj b.bc -o b.o ; clang++ b.o -o b.out; ./b.out 20
    runtime: 9.096919e+00

Apparently, clang++ -O0 -Xclang -disable-O0-optnone does not work as expected.

The conclusion seems to be -Xclang -disable-O0-optnone works when clang optimization level is O1/O2/O3, not O0.

Any comments?

-disable-O0-optnone has no effect with anything other than -O0.

-O0 being passed to clang also causes all functions to be marked noinline. I don’t know if there is a command line option to turn that off.

I recommend passing “-O1 -Xclang -disable-llvm-passes” to clang. Passing -O0 very specifically means disable optimizations.

Thanks a lot, it is clear to me now.

BTW, for Clang’s slowdown, I submit an issue here: https://github.com/flang-compiler/flang/issues/356

I have no idea about the root cause.
Maybe due to debug symbols. But, I already use -DCMAKE_BUILD_TYPE=Release.
Anyway, I believe there is a bug somewhere.

Why are you using build directions from “flang” which is a fortran compiler and maintained by different people than the LLVM/clang community? But then compiling C/C++ code? Their bug database should be used for filing bugs against the fortran compiler not a C/C++ compiler issue.