[RFC] Testing strategy for X86 CPU/feature detection

As a part of my cleanup of the compiler-rt/X86 target parsing/feature detection (#97856, #97861, #97863, #97872, #97877, a couple more time), I’m interested in implementing integration tests (as they currently don’t seem to exist). The cleanup is intended to unify the LLVM/compile-rt implementation by at least having two identical .inc files containing all the common code that are validated as identical. Based on discussion in Code sharing between compiler-rt and clang/llvm - #9 by MaskRay, it also seems like we could just have a single copy, assuming the correct CMake options, but either way, the unification I think will make the code significantly more maintainable. The challenge for this is we would need to emulate CPUID for the test in a clean way. I guess that’s possible with ptrace, but that seems unnecessarily complicated. After [compiler-rt][X86] Use functions in cpuid.h instead of inline assembly by boomanaiden154 · Pull Request #97877 · llvm/llvm-project · GitHub, we’ll be calling functions defined in a header (cpuid.h) however.

I’m thinking a reasonable way to setup the test would be to redefine __get_cpuid and __get_cpuid_count in another translation unit next to the test and then link that first/weaken the symbols in the test so that I can set arbitrary values to come from CPUID and then test functionality like __builtin_cpu_supports and __builtin_cpu_is so that we can validate that we get the expected features/CPU name/microarch name. We could do something like set register values in a struct that the test __get_cpuid and __get_cpuid_count implementations will read from and return.

This would help test that everything is synced correctly between the runtime implementation and the LLVM side. It would also make sure that the runtime implementation works as expected, barring an issue in clang’s cpuid.h, but that is pretty well tested and I would assume pretty battle tested by this point. The setup should be improved after I finish refactorings, but is currently very fragile, with manual struct layout being done to ensure proper sync between compiler-rt and a preprocessor generated table in LLVM:

Even assuming all the refactorings go through, having additional test coverage always seems like a good thing.

Thoughts on the testing strategy? Any possible problems? Things that could be done to improve it?

CC: @RKSimon @MaskRay @phoebe @KanRobert @FreddyYe

If you are doing major changes anyway, can’t we hide the builtins in classes?

class CPU {};
class RealCPU : public CPU {};
class MockCPU: public CPU {};

Then we can rely on GoogleTest for testing.

We can’t put __get_cpuid or __get_cpuid_count in a class as they’re in a clang header that needs to work with C (cpuid.h). That’s what we need to emulate for a decent test, and the header definition is sort of fixed given compatibility issues.

It’s very possible I’m not understanding your suggestion however.

Relying on GoogleTest is definitely the plan. Just with a little linker magic at build time to insert the fake __get_cpuid and __get_cpuid_count implementations. It seems a bit hacky, but I cannot think of a cleaner way to do it.

Sorry, for being too brief:

class CPU {
public:
  virtual XXX getCpuCount() = 0;
  virtual YYY getCpuId() = 0;
};

Then we can inherit from CPU for RealCpu and MockCpu and overwrite the needed functions.

I’m still not sure how this is supposed to work. Essentially we want to have the tests look like this:

TEST(TestBuiltinCPU, Skylake) {
  cpuid_register_values = {
    // set cpuid return register values here.
  }
  EXPECT_EQ(__builtin_cpu_is(), 'skylake')
  EXPECT_EQ(__builtin_cpu_supports('sse4.2'), 1)
  EXPECT_EQ(__builtin_cpu_supports('sse4.1'), 1)
  EXPECT_EQ(__builtin_cpu_supports('AVX512'), 0)
}

The builtins used there (__builtin_cpu_is and __builtin_cpu_supports) grab their data from a global struct setup by __builtin_cpu_init, a call for which is inserted by the compiler before the other builtins in most circumstances. If I’m understanding what you’re saying correctly, you’re suggesting doing something like passing an implementation of the proposed CPU class to __builtin_cpu_init? I don’t think that’s feasible given how these functions are used by the compiler and adding additional parameters would need changes on the compiler side that don’t make a lot of sense.