Techniques for runtime codepath selections utilizing new ISAs


I’ve programmed a codebase utilizing a templated simd math library allowing to write all algorithms using generic scalar/vector types. Entrypoints then switch on available instructions sets like AVX, SSE etc. at runtime and invokes the most optimal supported codepath, so one single binary supports every platform from old Pentiums to the newest cpus, while the critical code is running optimally.

Of course, the whole codebase had to be compiled with no cpu/arch extensions to ensure the compiler doesn’t insert unsupported instructinos in non-controlled code paths.

On every compiler I’ve used, all is well and fine. My xcode/clang compiler was pretty dated though (the last xcode 5 version). Unfortunately, users compiling and using my open source projects are reporting that nothing compiles anymore.

At some point, this functionality was broken as clang now emits errors like this:

error: always_inline function ‘_mm256_and_ps’ requires target feature ‘avx’, but would be inlined into function ‘vand’ that is compiled without support for ‘avx’
inline v8sf vand(v8sf a, v8sf b) { return _mm256_and_ps(a, b); }

Suffice to say, this completely broke all of my projects. However, I know this is a caveat relying on non-standardized behaviour - and I’m pretty sure there are reasons for avoiding mixing of code with different targets. Still, seeing as this technique is immensively beneficial for performance and legacy support, I’m still extremely interesting in making it work (somehow).

As far as I know, the only current way to make this work, is to compile every source file with different compiler flags. This is, however, very tedious, inflexible and requires manual maintenance on every source file every time something changes as well as completely separating all code requiring vector operations and normal code. Additionally, you know have to specialize all code manually for each ISA instead of relying on template-code generation, creating massive code duplication and maintenance.

The other alternative is to compile separate binaries for each supported platform…

I’m not 100% sure, but I think that the CLang support for ‘attribute((target(…)))’ might help.



The GCC equivalent does allow you to have multiple target specific solutions in the same source. However, I don’t see a mention of ‘target_clones’ for X86 in CLang which is possibly closer to what you need.