Clang-14.0.6 performance optimization

Dear All,

Could you please let me know techniques to optimize performance with clang-14.0.6 version compiler while building c/c++ code on linux machine?

Regards
Koti

-march=native -flto -fomit-frame-pointer -O3 -DNDEBUG

Dear Wongboo

Thank you for your input. The flags mentioned in previous comment should be passed to clang package and rebuild the clang binaries/libraries and use it for compiling c/c++ code?

Regards
Koti

Do you want a faster compiler or faster code built with the compiler?

To make your compiler faster the common techniques are:

Hi,

Thank you for your reply. I am expecting clang optimization that in turn helps in building network code to improve network throughput speed.

Regards
Koti

It really depends on how far you want to go. As said above:
-march=native -flto -fomit-frame-pointer -O3 -DNDEBUG
is a good start. If you want to go further, you can play with PGO and BOLT.

Hi

Thank you for inputs. My target is to achieve higher throughput with Linux image built by clang compared to gcc built image . As of now, I noticed throughput with image built by clang-14.0.6 compiler is 2.5% less than gcc built image during network testing on device. So can you please let me know, which of the previous comment steps helps in improving throughput performance of image ?

Regards
Koti

Hi,
Please ignore my previous comment, and i just want to check the clang performance with inclusion of below parameters mentioned in one of previous comments.
{{
-march=native -flto -fomit-frame-pointer -O3 -DNDEBUG
}}
So, could you please let me know how to add ‘-march=native’ and ‘-fomit-frame-pointer’ and ‘-O3 -DNDEBUG’ options to clang package?

Regards
Koti

You have to pass these parameters to clang, i.e:

clang -march=native -flto -fomit-frame-pointer -O3 -DNDEBUG -c foo.c -o foo.o

Dear tschuett,

clang -march=native -flto -fomit-frame-pointer -O3 -DNDEBUG -c foo.c -o foo.o
I am aware of this kind of step. but my requirement is to build kernel with clang compiler by automatically passing above flags as part of building all kernel source code files. So it seems i need to rebuild clang source code by adding above options so that, clang compiler pass these options by default to kernel source code files during compilation.

Regards
Koti

You can probably use config files for this: Clang Compiler User’s Manual — Clang 16.0.0git documentation (llvm.org)

Or change the kernel build system to pass these flags.

Dear tobiashieta

Thank you for input. it’s better to pass required flags to kernel Makefile instead of building whole clang package again.

Regards
Koti

Dear Tobiashieta,

Passed “-march=native” and built kernel image and noticed performance of http application is reduced further compared to gcc built kernel image.
other parameters mentioned in above command like ‘omit-frame-pointer’ is already present and ‘-O3 DNEDBUG’ is supported for ‘arc’ architecture and can’t be used for
x86_64 arch(which i am using currently).
‘-flto’ flag prevent detailed debugging , hence it’s not passed for performance testing.

As of now, ‘PGO’ is another good option for trial, Can you just let me know if ‘PGO’ helps in application performance or it just makes compilation faster ?
if performance is improved, could you please let me know , how ‘PGO’ is enabled for clang in docker buildroot envrionment?

Regards
Koti

I am no expert on these things - I think there are probably others that know better how to get the most performance out of the Linux kernel and clang. I don’t know if there is a write-up somewhere.

Hi,

Thank you for your reply. Meanwhile, can you let me know how to use above memory allocators like jemalloc , rpmalloc etc? you mean, just use above memory allocation methods in application code and compile using clang to get better performance?

LTO (-DLLVM_ENABLE_LTO=THIN)
with Thin LTO flag, is debug symbols available for debugging using gdb etc ?
(or ) above config option is same as ‘-flto’ flag passed to compiler?

Regards
koti

Hi,

I tried to enable LTO (-DLLVM_ENABLE_LTO=THIN) In clang.mk and try to build clang in buildroot environment. But clang package build is failing with error.
{{
3%] Building CXX object utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangDataCollectorsEmitter.cpp.o
cc1plus: error: unrecognized argument to ‘-flto=’ option: ‘thin’
cc1plus: error: unrecognized argument to ‘-flto=’ option: ‘thin’
[ 4%] Building CXX object utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangDiagnosticsEmitter.cpp.o
[ 4%] Generating …/…/lib/libscanbuild/arguments.py
[ 4%] Generating …/…/lib/libscanbuild/clang.py
make[3]: *** [utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/build.make:76: utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ClangASTNodesEmitter.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs…
make[3]: *** [utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/build.make:63: utils/TableGen/CMakeFiles/obj.clang-tblgen.dir/ASTTableGen.cpp.o] Error 1
}}
The flag is added with below change to clang.mk file.
{{
diff --git a/package/clang/clang.mk b/package/clang/clang.mk
index 1fda702…9fdd1c4 100644
— a/package/clang/clang.mk
+++ b/package/clang/clang.mk
@@ -41,6 +41,9 @@ CLANG_CONF_OPTS += -DBUILD_SHARED_LIBS=OFF
HOST_CLANG_CONF_OPTS += -DCMAKE_BUILD_TYPE=Release
CLANG_CONF_OPTS += -DCMAKE_BUILD_TYPE=Release

+HOST_CLANG_CONF_OPTS += -DLLVM_ENABLE_LTO=Thin
+CLANG_CONF_OPTS += -DLLVM_ENABLE_LTO=Thin
+
CLANG_CONF_OPTS += -DCMAKE_CROSSCOMPILING=1

}}
Please let me know how to support ‘Thin’ LTO support.

Regards
Koti

You are using a compiler that doesn’t support -flto=thin. You need a compiler that supports this to use -DLLVM_ENABLE_LTO=Thin.

Note, however, that passing -DLLVM_ENABLE_LTO=Thin will make the clang that you build faster, but should not have any impact on anything you build with that clang. This does not seem to be what you want to do, though perhaps I’ve read your comments wrongly.

You can try PGO, but doing this is complicated. You will first need to generate profile data for your application, then build your application using that profile data. See Clang Compiler User’s Manual — Clang 16.0.0git documentation for a guide on how to do this.

Could you provide the output of:

 cc --version

If it does not work, then maybe gcc --version.

Next test:

touch foo.c
cc -flto=thin -c foo.c

Dear carlocab,

-DLLVM_ENABLE_LTO=Thin will make the clang that you build faster, but should not have any impact on anything you build with that clang . This does not seem to be what you want to do, though perhaps I’ve read your comments wrongly.

Yes. i am expecting performance with clang compiler build image while running application on device flashed with clang image.

ou can try PGO, but doing this is complicated. You will first need to generate profile >>data for your application, then build your application using that profile data.

Yes. it’s bit complicated as it is required to generate profile data for application and then use that profile data.

Meanwhile, can you let me know any further inputs to improve performance of application on clang image?

Regards
Koti

To be sure, I’ve interpreted this to mean that you want stuff you built with clang to go fast, and not for clang itself to go fast. Please feel free to clarify if I’ve misunderstood. (Maybe you want both?)

I know of no other methods beyond what is suggested above. You could perhaps try various settings for -march (e.g. -march=haswell, etc) depending on the machine you wish to target. But this will require lots of recompiling and benchmarking so it may prove to not be simpler than PGO.