Clang compiled binary significantly slower than gcc compiled binary

I’m seeing significant slowdown with Clang compiled binary vs gcc compiled binary. I have many other benchmarks and only a very small percentage falls into this category where Clang binary underperforms significantly.

Modulo warnings suppression, compile flags are identical and both binaries produce identical results for all benchmarks I have including the one for which I’m sharing runtime information.

Clang 15.0.0 (I tried 15.0.1 as well but difference remains)
Gcc 11.2.0

Now, my question is how do I find out why Clang performs so badly ? I tried profiling but it’s not helpful as it seems like slowdown is spread across and not specific to any function.
What suggestions do others have to nail it down ?
Let me know if any more information is needed in order to comment better.

Clang compiled binary run:
total time (s): user 1183 sys 26 real 1209 ; memory (MB): curr 6869 prog-peak 6869

gcc compiled binary run:
total time (s): user 409 sys 7 real 423 ; memory (MB): curr 6873 prog-peak 6873

It looks like that’s a huge regression, so there should be functions that get significantly slower which should show up during profiling. Then you’d have to drill down and check the differences in the generated code for those functions and go from there.

Another thing to look out for is differences in inlining between GCC/Clang. If there are, you could try to use always_inline / noinline attributes to eliminate those differences.

If the benchmark is public, it might be good to share details on how to reproduce the difference, in case other people want to take a look as well.

Finally, do you know if older Clang versions also generate slow code or is this a recent Clang regression?

Cheers

Thanks for your reply.

Indeed, it’s a big proprietary regression and can’t be shared. I did profiling for both binaries and as I mentioned before there is no smoking gun. In terms of percentage, all functions appear to follow same pattern for both gcc compiled binary and Clang compiled binary.

Before, gcc was 8.3 and Clang was 13.0.1 and platform was CentOS 6 and Clang compiled binary used to always outperform gcc compiled binary. Now, gcc is 11.2 and Clang is 15.0.1 and platform is CentOS 7. In fact, 13.0.1 on CentOS 7 also had similar slow down. So, I can’t tell for sure if gcc did some optimization or Clang is missing on some optimization or something else ?