Not sure this belongs here in the loop optimisations or infrastructure topic, but will try here.
I am looking for a set of loop focused (micro) benchmarks. TSVC that is part of the LLVM test-suite ticks that box, but I found a few things that I would like to change. I wanted to check here if benchmark modifications are allowed/desired or not. I guess that may depend on the type of modifications, so I will give a few examples of some of the first things I noticed looking at this benchmark.
The integer variant of the abs()
function is called on float values. I’m not entirely sure but I think what was intended here was to call the float fabs
variant. Current codegen is inefficient, and I was thinking about recognising this pattern in the compiler, but perhaps fixing the source is better in this case.
At first glance it looks like the kernels are timed with calls to clock()
before and after the for-loops, except that clock()
is defined as #define clock() 0
, so the reported times when you run this with lit
is actually not for the kernel invocations but for everything else too that is going on.
Defining clock will result in outputs like:
Loop Time(Sec) Checksum
S311 0.53 10.9507
S31111 0.04 10.9507
...
but that causes the test to fail because the reference output expects 0.00 for the time, so I probably want to change that (ignore Time for the correctness check).
There is an ntimes
variable that controls the number of times the kernel is invoked, which is a way a increase the runtime of kernels, except for some kernels that doesn’t seem to work as their execution time is close to 0 (kernel optimised away?), so something is going on there and needs a bit of investigations. This ntimes
variable is passed on the command line as an argument to the executable, in the CMakeList.txt that could e.g. be 1670 and specified like so:
RUN_OPTIONS 1670 5
but I would like to double/triple/quadruple this value for some.
Any hints/tips or alternatives (for tsvc) will be gratefully received.