Clang-14.0.6 performance optimization

koti · October 6, 2022, 11:44am

Dear All,

Could you please let me know techniques to optimize performance with clang-14.0.6 version compiler while building c/c++ code on linux machine?

Regards
Koti

Wongboo · October 7, 2022, 6:27am

-march=native -flto -fomit-frame-pointer -O3 -DNDEBUG

koti · October 7, 2022, 6:46am

Dear Wongboo

Thank you for your input. The flags mentioned in previous comment should be passed to clang package and rebuild the clang binaries/libraries and use it for compiling c/c++ code?

Regards
Koti

tobiashieta · October 7, 2022, 6:49am

Do you want a faster compiler or faster code built with the compiler?

To make your compiler faster the common techniques are:

LTO (-DLLVM_ENABLE_LTO=THIN)
PGO ( How To Build Clang and LLVM with Profile-Guided Optimizations — LLVM 16.0.0git documentation)
Use another memory allocator (jemalloc, rpmalloc etc)
BOLT ( llvm-project/OptimizingClang.md at main · llvm/llvm-project (github.com))

koti · October 7, 2022, 6:42pm

Hi,

Thank you for your reply. I am expecting clang optimization that in turn helps in building network code to improve network throughput speed.

Regards
Koti

tschuett · October 7, 2022, 6:51pm

It really depends on how far you want to go. As said above:
-march=native -flto -fomit-frame-pointer -O3 -DNDEBUG
is a good start. If you want to go further, you can play with PGO and BOLT.

koti · October 8, 2022, 7:01am

Hi

Thank you for inputs. My target is to achieve higher throughput with Linux image built by clang compared to gcc built image . As of now, I noticed throughput with image built by clang-14.0.6 compiler is 2.5% less than gcc built image during network testing on device. So can you please let me know, which of the previous comment steps helps in improving throughput performance of image ?

Regards
Koti

koti · October 10, 2022, 9:12am

Hi,
Please ignore my previous comment, and i just want to check the clang performance with inclusion of below parameters mentioned in one of previous comments.
{{
-march=native -flto -fomit-frame-pointer -O3 -DNDEBUG
}}
So, could you please let me know how to add ‘-march=native’ and ‘-fomit-frame-pointer’ and ‘-O3 -DNDEBUG’ options to clang package?

Regards
Koti

tschuett · October 10, 2022, 9:22am

You have to pass these parameters to clang, i.e:

clang -march=native -flto -fomit-frame-pointer -O3 -DNDEBUG -c foo.c -o foo.o

koti · October 10, 2022, 9:28am

Dear tschuett,

clang -march=native -flto -fomit-frame-pointer -O3 -DNDEBUG -c foo.c -o foo.o
I am aware of this kind of step. but my requirement is to build kernel with clang compiler by automatically passing above flags as part of building all kernel source code files. So it seems i need to rebuild clang source code by adding above options so that, clang compiler pass these options by default to kernel source code files during compilation.

Regards
Koti

tobiashieta · October 10, 2022, 9:32am

You can probably use config files for this: Clang Compiler User’s Manual — Clang 16.0.0git documentation (llvm.org)

Or change the kernel build system to pass these flags.

koti · October 10, 2022, 10:08am

Dear tobiashieta

Thank you for input. it’s better to pass required flags to kernel Makefile instead of building whole clang package again.

Regards
Koti

koti · October 11, 2022, 6:30am

Dear Tobiashieta,

Passed “-march=native” and built kernel image and noticed performance of http application is reduced further compared to gcc built kernel image.
other parameters mentioned in above command like ‘omit-frame-pointer’ is already present and ‘-O3 DNEDBUG’ is supported for ‘arc’ architecture and can’t be used for
x86_64 arch(which i am using currently).
‘-flto’ flag prevent detailed debugging , hence it’s not passed for performance testing.

As of now, ‘PGO’ is another good option for trial, Can you just let me know if ‘PGO’ helps in application performance or it just makes compilation faster ?
if performance is improved, could you please let me know , how ‘PGO’ is enabled for clang in docker buildroot envrionment?

Regards
Koti

tobiashieta · October 11, 2022, 6:34am

I am no expert on these things - I think there are probably others that know better how to get the most performance out of the Linux kernel and clang. I don’t know if there is a write-up somewhere.

koti · October 11, 2022, 8:14am

Hi,

Thank you for your reply. Meanwhile, can you let me know how to use above memory allocators like jemalloc , rpmalloc etc? you mean, just use above memory allocation methods in application code and compile using clang to get better performance?

LTO (-DLLVM_ENABLE_LTO=THIN)
with Thin LTO flag, is debug symbols available for debugging using gdb etc ?
(or ) above config option is same as ‘-flto’ flag passed to compiler?

Regards
koti

koti · October 11, 2022, 2:26pm

Hi,

I tried to enable LTO (-DLLVM_ENABLE_LTO=THIN) In clang.mk and try to build clang in buildroot environment. But clang package build is failing with error.
{{
3%] Building CXX object utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangDataCollectorsEmitter.cpp.o
cc1plus: error: unrecognized argument to ‘-flto=’ option: ‘thin’
cc1plus: error: unrecognized argument to ‘-flto=’ option: ‘thin’
[ 4%] Building CXX object utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangDiagnosticsEmitter.cpp.o
[ 4%] Generating …/…/lib/libscanbuild/arguments.py
[ 4%] Generating …/…/lib/libscanbuild/clang.py
make[3]: *** [utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/build.make:76: utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangASTNodesEmitter.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs…
make[3]: *** [utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/build.make:63: utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ASTTableGen.cpp.o] Error 1
}}
The flag is added with below change to clang.mk file.
{{
diff --git a/package/clang/clang.mk b/package/clang/clang.mk
index 1fda702…9fdd1c4 100644
— a/package/clang/clang.mk
+++ b/package/clang/clang.mk
@@ -41,6 +41,9 @@ CLANG_CONF_OPTS += -DBUILD_SHARED_LIBS=OFF
HOST_CLANG_CONF_OPTS += -DCMAKE_BUILD_TYPE=Release
CLANG_CONF_OPTS += -DCMAKE_BUILD_TYPE=Release

+HOST_CLANG_CONF_OPTS += -DLLVM_ENABLE_LTO=Thin
+CLANG_CONF_OPTS += -DLLVM_ENABLE_LTO=Thin
+
CLANG_CONF_OPTS += -DCMAKE_CROSSCOMPILING=1

}}
Please let me know how to support ‘Thin’ LTO support.

Regards
Koti

carlocab · October 11, 2022, 5:38pm

You are using a compiler that doesn’t support -flto=thin. You need a compiler that supports this to use -DLLVM_ENABLE_LTO=Thin.

Note, however, that passing -DLLVM_ENABLE_LTO=Thin will make the clang that you build faster, but should not have any impact on anything you build with that clang. This does not seem to be what you want to do, though perhaps I’ve read your comments wrongly.

You can try PGO, but doing this is complicated. You will first need to generate profile data for your application, then build your application using that profile data. See Clang Compiler User’s Manual — Clang 16.0.0git documentation for a guide on how to do this.

tschuett · October 11, 2022, 7:00pm

Could you provide the output of:

 cc --version

If it does not work, then maybe gcc --version.

Next test:

touch foo.c
cc -flto=thin -c foo.c

koti · October 12, 2022, 4:52am

Dear carlocab,

-DLLVM_ENABLE_LTO=Thin will make the clang that you build faster, but should not have any impact on anything you build with that clang . This does not seem to be what you want to do, though perhaps I’ve read your comments wrongly.

Yes. i am expecting performance with clang compiler build image while running application on device flashed with clang image.

ou can try PGO, but doing this is complicated. You will first need to generate profile >>data for your application, then build your application using that profile data.

Yes. it’s bit complicated as it is required to generate profile data for application and then use that profile data.

Meanwhile, can you let me know any further inputs to improve performance of application on clang image?

Regards
Koti

carlocab · October 12, 2022, 3:40pm

To be sure, I’ve interpreted this to mean that you want stuff you built with clang to go fast, and not for clang itself to go fast. Please feel free to clarify if I’ve misunderstood. (Maybe you want both?)

I know of no other methods beyond what is suggested above. You could perhaps try various settings for -march (e.g. -march=haswell, etc) depending on the machine you wish to target. But this will require lots of recompiling and benchmarking so it may prove to not be simpler than PGO.

Topic		Replies	Views
Making Clang/LLVM faster using code layout optimizations LLVM Dev List Archives	3	181	October 19, 2018
Clang PGO mystery - am I holding this wrong? Clang Frontend pgo , clang	2	162	April 22, 2024
About Clang llvm PGO LLVM Dev List Archives	3	94	May 9, 2016
Current state-of-the-art in whole program optimization LLVM Dev List Archives	2	63	February 2, 2015
llvm and clang are getting slower LLVM Dev List Archives	37	141	April 1, 2016

Clang-14.0.6 performance optimization

Related Topics