Intel intrinsics on clang-cl

Hello everyone!

Recently I was in the process of compiling Qt 6.1 with a wide variety
of toolchains on Windows. While doing so I was also attempting to
build the library using clang-cl, which sadly led to a build failure
due to _rdrand64_step not being declared.
Someone happened to have already reported the bug to Qt here
https://bugreports.qt.io/browse/QTBUG-88434 but they won't fix it and
rightfully so. With further research I discovered that there are many
intrinsics in immitrin.h and associate headers that are simply
disabled when using clang-cl unless explicitly enabled by various
feature macros (one's that aren't documented for clang-cl).

An example of a few of these:
_rdrand16_step, _rdrand32_step and on x86_64 _rdrand64_step are behind
the guard:

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
    defined(__RDRND__)

So therefore if one wanted to use these intrinsics one would have to
define __RDRND__ on the commandline. This is contrary to how MSVC
handles it, as those are always available during compilation.

Similarly many BMI intrinsics such as _blsi_u32, _blsmsk_u32 or
_blsr_u32 are guarded behind such a header guard (except checking for
the __BMI__ macro instead).

A git log revealed that these header guards were introduced in this
commit here https://github.com/llvm/llvm-project/commit/379a1952b37247975d2df8d23498675c9c8cc730
for the sake of compile time. I think this makes a lot of sense for
the various intrinsics for AVX and similar, as these need to be
accompanied by a /arch: option on the command line anyways, but the
above intrinsics do not have such an option.

I am now wondering if it would make sense to enable intrinsics such as
the above (ones that aren't dependent on /arch: ) by default when
compiling for an MSVC target? This seems to be what MSVC users expect.
And besides, without looking at clangs headers it is currently
impossible to find out how to enable these intrinsics as it's
undocumented which defines one would have to set on the command line
to enable these.

Kind regards,

Markus

It looks to me like the rdrand feature is a special case. For any other feature, you can include the feature-specific header, like bmiintrin.h after including immintrin.h, and it will work out. For rdrand, they are defined inline in immintrin.h, so you have to resort to adjusting compiler flags to get those definitions.

I can try to remember and recount some background about how we got here. Intel at some point decided that it would be a good idea to make immintrin.h an umbrella header. Umbrella headers are usually really bad for compile time, so IMO this was a huge mistake. When Intel added the AVX intrinsics to immintrin.h, we in Chrome noticed that our Windows compile time regressed significantly. This is because various STL headers at the time included intrin.h, which includes immintrin.h, which dragged in all the AVX stuff. When we noticed the regression, Nico went ahead and added back those ifdefs. It’s not ideal, but it’s also unreasonable to spend XX% of every compile that uses parsing AVX512 intrinsics that nobody ever uses.

There was a follow-up proposal to build a module for immintrin.h to solve the compile time problems that way. However, the effort hasn’t gone anywhere.