[test-suite] r261857 - [cmake] Add support for arbitrary metrics

Let’s move this to llvm-dev. I should describe my goals/motivation for the work I have been putting into the llvm-testsuite lately. This is how I see the llvm-test-suite today:

  • We provide a familiar cmake build system so people have a known environment to tweak compilation flags.
  • Together with the benchmark executable we build a .test file that describes how to invoke the benchmark and can be run by the familiar llvm-lit tool:
  • Running a benchmark means executing its executable with a certain set of flags. Some of the SPEC benchmarks even require multiple invocations with different flags.
  • There is a set of steps to verify that the benchmark worked correctly. This usually means invoking “diff” or “fpcmp” and comparing the results with a reference file.
  • The lit benchmark driver modifies these benchmark descriptions to create a test plan. In the simplest case this means prefixing the executable with “timeit” and collecting the number. But we are adding more features like collecting code size, running the benchmark on a remote device, prefixing different instrumentation tools like the linux “perf” tool, a utility tasks that collects and merge PGO data files after a benchmark run, …

This allows us to add new instrumentation and metrics in the future without touching the benchmarks itself. It works best for bigger benchmark that run for a while (a few seconds minimum). It works nicely with benchmark suites like SPEC, geekbench, mediabench… Let’s call this “macro benchmarking”.

Having said all that. You make a very good case for what we should call “micro benchmarking”. The google benchmarking library does indeed look like a fantastic tool. We should definitely evaluate how we can integrate this into the llvm test-suite, we think of it as a new flavor of benchmarks. We won’t be able to redesign SPEC but we surely can find things like TSVC which we could adapt to this. I have no immediate plans to put much more work into the test-suite, but I agree that micro benchmarking would be an exciting addition to our testing strategy. I’d be happy to review patches or talk through possible designs on IRC.

  • Matthias

Note: I suggested to have the Halide test infrastructure compatible with google benchmarks framework during the initial review, because long term Halide can generate interesting micro-benchmarks.

Okay, I was intrigued, tried it and it turns out you can add make a patch for basic google benchmark support in 40 minutes:


So there is a base now if someone wants to write benchmarks for it in the future.

  • Matthias

Awesome Matthias!