I have a code written in Fortran but with C/C++ kernels.
So I have to use both Flang and Clang to compile, maybe I should use LLVM5.0 release for c/c++ and only their Flang for fortran code.
It should work but I have not tried yet.
LLD has -lto-newpm-passes (and the corresponding -lto-newpm-aa-pipeline) which allows you to pass a custom pass pipeline with full control. At one point I was using a similar modification to clang (see https://reviews.llvm.org/D21954) that never landed.
– Sean Silva
@Sean, do you mean llc ?
For llc 4.0 and llc 5.0, I cannot find -lto-newpm-passes option, is it a hidden one?
No, I meant LLD, the LLVM linker. This option for LLD is relevant for exploring different pass pipelines for link time optimization.
It is essentially equivalent to the -passes flag for ‘opt’.
Such a flag doesn’t make much sense for ‘llc’ because llc mostly runs backend passes, which are much more difficult to construct custom pipelines for (backend passes are often required for correctness or have complex ordering requirements).
– Sean Silva
Hi,
“SetC” options are LLVM cl::opt options, they are intended for LLVM developer and experimentations. If a settings is intended to be used as a public API, there is usually a programmatic way of setting it in LLVM.
“SetA” is what clang as a C++ compiler exposes to the end-user. Internally clang will (most of the time) use one or multiple LLVM APIs to propagate a settings.
Best,
@Sean, here is my summary of several tools.
Format: (ID,tool, input->output, timing, customization, questions)
- llc, 1 bc → 1 obj, back-end compile-time (code generation and machine-dependent optimizations), Difficult to customize pipeline, N/A2. LLD: all bc files and obj files → 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimizations and code generations and machine-dependent optimizations, Easy to customize pipeline w/ -lto-newpm-passes, what is the connection between -lto-newpm-passes and -lto-newpm-aa-pipeline and how to use -lto-newpm-passes to customize pipeline?
- gold: mixed obj files and bc files → 1 binary (passing -flto to clang for *.bc file generation), back-end link-time optimization w/ LLVMgold.so and code generation and machine-dependent optimizations, unaware of whether it is customizable by means of command line options, can we consider LLD a more customizable gold from perspective of pipeline customization?
- opt, 1 bc file → 1 file at a time, middle-end machine-independent (may be others?), Easy to customize pipeline by means of command line options, N/A
- llvm-link, many *bc file → 1 bc file, link-time (unknown whether there is any optimization) and Unknown why it exists, unknown how to do customization, N/A
With above understandings, there are several ways to fine-grained tune clang/llvm optimization pipeline:
-
clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -emit-llvm -O1 -Xclang -disable-llvm-passes), → opt (w/ customizable middle-end optimizations for each bc file independently) → gold (un-customizable back-end link-time optimization and code generation)
-
clang (c/c++ to bc translation, with minimal front-end optimizations, w/ -flto) -->opt ( same as 1) → lld (w/ -lto-newpm-passes for link-time optimization pipeline customization, how?)
-
clang (c/c++ to *bc translation and optimization, customizable by mean of clang command-line options, maybe including both front-end optimization and middle-end optimizations). W/O explicitly specifying opt optimization pipeline, there may still be middle-end optimizations happening; also w/o explicitly specifying linker, it may use GNU ld / GNU gold / lld as the linker and with whichever’s default link-time optimization pipeline.
So, it seems to me that 2 is the most customizable pipeline, with customizable middle-end and back-end pipeline independently, the 1 with only customizable middle-end optimization pipeline, and then 3 has the least amount of control of optimization pipeline by means of clang command-line.
Thanks for your time and welcome to any comments!
Thanks a lot, Mehdi.
For GCC, there are around 190 optimization flags exposed as command-line options.
For Clang/LLVM, the number is 40, and many important optimization parameters are not exposed at all, such as loop unrolling factor, inline function size parameters.
I understand there is very different idea for whether or not expose many flags to end-user.
Personally, I believe it is a reasonable to keep end-user controllable command-line options minimal for user-friendliness.
However, for users who care a lot for a tiny bit performance improvement, like HPC community, it may be better to expose as many fine-grained tunables in the form of command line options as possible. Or, at least there should be a way to achieve this fairly easy.
I am curious about which way is the best for my purpose.
Please see my latest reply for 3 possible fine-grained optimization pipeline.
Looking forward to more discussions.
Thanks a lot!
Hi Toddy,
You can achieve what you’re looking for with a pipeline based on clang -Ox
+ opt -Ox
+ llc -Ox
(or lld instead of llc), but this won’t be guarantee’d to be well supported across releases of the compiler.
Otherwise, if there are some performance-releated (or not…) command line options you think clang is missing / would benefit, I invite you to propose adding them to cfe-dev@lists.llvm.org and submit a patch!
Best,
You just specify the list of passes to run, as you would to opt -passes
-lto-newpm-aa-pipeline has the same function as opt’s -aa-pipeline option.
Gold and LLD are very similar for this purpose, and LLD has some extra goodies like -lto-newpm-passes
llvm-link doesn’t perform optimizations.
The thing customized by -lto-newpm-passes is actually a middle-end pipeline run during link time optimization. The backend is not very customizable.
Also, note that with a clang patch like the one I linked, you don’t need opt because you can directly tell clang what fine-grained set of passes to run in the middle end.
One approach I have used in the past is to compile with -flto -O0 -disable-O0-optnone and then do all optimizations at link time. This can simplify things because you only need to re-run the link command (it still takes a long time, but with sufficient ram (and compiling without debug info) you should be able to run multiple different pass pipelines in parallel). If your only goal is to test middle-end pass pipelines (e.g. synergies of different passes) then that can be a good approach. However, keep in mind that this is really just a small part of the larger design problem of getting the best code with the best compile time. In practice, profile feedback (and making good use of it), together with accurate cost modeling (especially in the middle end for optimizations like inlining and loop unrolling), together with appropriate link-time cross-module optimization tend to matter just as much (or more) than a particularly intelligently chosen sequence of passes.
Also, keep in mind that even though we in principle have a lot of flexibility with the sequence of passes in the middle-aged, in practice a lot of tuning and bug fixing has been done with the default O2/3 pipelines. If you deviate from them you may end up with pretty silly code. An example from my recent memory was that an inopportune running of GVN can cause a loop to have an unexpected set if induction variables, throwing off other optimizations.
– Sean Silva
Hi Mehdi,
Now we have 5 pipelines. (In addition to the first 3, which I have described in detail above, please refer my latest reply for details)
-
clang + opt + gold
-
clang + opt + lld
-
clang + GNU ld/ gold /lld
-
clang + opt + llc + clang
clang -emit-llvm -O1 -Xclang -disable-llvm-passes for c/c++ to .bc generation and minimal front-end optimization
opt for single bc file optimization
llc single bc file to obj file generation and back-end optimization (no link-time optimization is possible, since llc works on 1 bc file at a time)
clang again for linking all obj file to generate final executable. (although in principle there can be a link-time optimization even with all obj files, it requires a lot of work and is machine-dependent. This may also be the reason why modern compilers like LLVM/GCC/ICC, etc performs LTO not at obj level. But, obj level may yield extra benefit even LTO at intermediate level has been applied by compilers, because obj level can see more information.)
clang -Ox
+ opt -Ox
+ llc -Ox
is too coarse-grain.
- Modify clang to align with GCC/ICC so that many tunables are exposed at clang command line. Not sure how much work is needed, but at least requires an overall understanding of compiler internals, which can be gradually figured out.
I believe 5 is interesting, but 2 may be good enough. More experiments are needed before decision is made.
For the types of things that you are looking for, you may just want to try a bunch of -mllvm options. You can tune inlining and unrolling threshold like that, for example.
-mllvm Additional arguments to forward to LLVM’s option processing
This is dumped by clang. I am not sure what I am supposed to put as value in order to tune unrolling/inlining threshold.
Hi Sean,
Please check my inlined reply.
Looking forward to your comments.
Thanks for your time!
As the help says, this is used to pass argument to LLVM itself. If you
remember you earlier question about setA (clang options) and setC (opt
options), this allows to reach setC from the clang command line.
Any option that you see in the output of `opt --help` can be set from clang
using `-mllvm`. Same caveat as I mentioned before: these aren't supposed to
be end-user options.
Hi Medhi,
It seems -mllvm does not work as expected. Anything wrong?
[twang15@c92 temp]$ clang++ -O3 -mllvm -deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument ‘-deadargelim’. Try: ‘clang (LLVM option parsing) -help’
clang (LLVM option parsing): Did you mean ‘-regalloc’?
[twang15@c92 temp]$ clang++ -O3 -mllvm deadargelim LULESH.cc
clang (LLVM option parsing): Unknown command line argument ‘deadargelim’. Try: ‘clang (LLVM option parsing) -help’
-Tao
You can't schedule passes this way, only set parameters
like -unroll-threshold=<uint> etc.
Hi Medhi,
It seems -mllvm does not work as expected. Anything wrong?
[twang15@c92 temp]$ clang++ -O3 -mllvm *-deadargelim* LULESH.cc
clang (LLVM option parsing): Unknown command line argument
'-deadargelim'. Try: 'clang (LLVM option parsing) -help'
clang (LLVM option parsing): Did you mean '-regalloc'?[twang15@c92 temp]$ clang++ -O3 -mllvm *deadargelim* LULESH.cc
clang (LLVM option parsing): Unknown command line argument
'deadargelim'. Try: 'clang (LLVM option parsing) -help'You can't schedule passes this way, only set parameters
like -unroll-threshold=<uint> etc.Where can I find options like -unroll-threshold=<uint>? I cannot find it
in either opt -help or clang -help.
This one shows up in `opt --help-hidden`. Otherwise in the source code for
each transformation.
(remember when I mentioned these are intended for LLVM developers and not
end-user facing?).
Mehdi,
I found -unroll-max-count can be passed w/ -mllvm.
-dce, -adce, etc, are also dumped by ‘opt --help-hidden’. However, they cannot be passed w/ -mllvm.
Is this what “You can’t schedule passes this way, only set parameters like -unroll-threshold= etc.” means?
[twang15@c89 temp]$ clang++ -mllvm -unroll-max-count=4 -mllvm -dce -save-temps LULESH.cc
clang (LLVM option parsing): Unknown command line argument ‘-dce’. Try: ‘clang (LLVM option parsing) -help’
clang (LLVM option parsing): Did you mean ‘-mv4’?
Yes that is what he meant. “-dce, -adce, etc” are command line options consumed by tools/opt/opt.cpp to give to the PassManagerBuilder that it creates. The parsing of those options doesn’t exist in any of the llvm library code that is linked into clang. Clang has its own code for populating a PassManagerBuilder in tools/clang/lib/CodeGen/BackendUtil.cpp