Hi,
I am working on a project that currently compiles an arm application for an arm cortex-a9 target. We use GCC 13 and ld setup with lto. Our optimzation level is O2 for release builds.
Our GCC flash footprint of a stripped application with (–strip-all) is 29402kb.
Our ld linktime is ~10min.
In general both the GCC and Clang builds run on windows 11.
See details of the Clang compiler build at the bottom of this post.
As an experiment to save link time we have tried to compile our application with Clang 18.1.3 and lld using -flto=thin -Wl,–thinlto-jobs=0 -Wl,–thinlto-cache-policy=cache_size_bytes=5g
Our Clang flash footprint of a stripped application (–strip-all) is 33670kb.
Our lld linktime is ~2min + we get incremental linking in the range of ~15s , great
But as it can be seen we also get a footprint that has increased from:
GCC 29402kb → Clang 33670kb = 4268kb
4268kb is quite a huge increase in footprint, and it could be a dealbreaker for us.
The increase makes me wonder if we are missing something when we run thin lto?
Hence I have also done a compile using -fno-lto.
The Clang -fno-lto yields a footprint of 33802kb.
So from a footprint perspective Clang thin lto saves: 33802kb-33670kb = 132kb.
In comparison if I compile without lto using GCC and ld I get a footprint of 32494kb.
GCC footprint savings using lto: 32494kb-29402kb = 3092kb.
I did not expect Clang thin lto optimizations on code size to be so meager (132kb), I did expect something in the same order of magnitude as GCC (3092kb).
Since above shows that footprint of our application is 4268kb larger than our current GCC builds, I have also tried to do an Os build optimizing more for size.
Clang thinLTO -Os and GCC LTO -Os:
bin-size:
Clang: 32190 kb
GCC: 26690 kb
Idle thread :
Clang: 64.5 %
GCC: 70.4 %
Both footprint and runtime performance is worse for Clang when running Os, I have not been able to run Clang O2 builds yet due to run-time errors.
My question to the forum is what can I check to to figure out what goes wrong? It is quite hidden what optimizations thin lto actually applies is it possible to see what it actually does/ does not do, to close in on if something is wrong with our Clang builds? Any ideas on what I can check?
To detail what Clang compiler I am using:
My build is based on Clang 18.1.3 and newlib 4.3.0.
I am using GitHub - ARM-software/LLVM-embedded-toolchain-for-Arm: A project dedicated to building LLVM toolchain for 32-bit Arm embedded targets..
Runtimes are build in ubuntu for arm and thumb mode: armv7a_hard_neon_exn_rtti and armv7a_thumb_hard_neon_exn_rtti.
Bulk part of our application runs in thumb mode, only our board support package (bsp) runs in arm mode.
Our Clang compiler toolchain is build on windows 11.
I hope someone in here can help with some clues on what to check/ investigate to get to the bottom of what we are doing wrong, or of if Clang is simply is outperformed by GCC and ld for our setup. I can provide additional details on our builds on request.