How do other benchmarks deal with unstable algorithms or differences in floating point results?
haven’t been following this thread, but this sounds like a typical
unstable algorithm problem. Are you always operating that close to
the tolerance level of the algorithm or are there some sets of inputs
that will behave reasonably?
What do you mean by “reasonably” or “affect codes so horribly”?
The accumulation of algorithms in a physics pipeline is unstable and unless the compiler/platform
guarantees 100% identical floating point results, the outcome will diverge.
Do you think LLVM can be forced to produce identical floating point results?
Even when using different optimization levels or even different CPUs?
Some CPUs use 80bit FPU precision for intermediate results (on-chip in registers),
while variables in-memory only use 32-bit or 64bit precision.
In combination with cancellation and other re-ordering this can give slightly different results.
If not, the code doesn’t seem very useful to me. How could anyone rely
on the results, ever?
The code has proven to be useful for games and special effects in film,
but this particular benchmark might not suite LLVM testing indeed.
I suggest working on a better benchmark that tests independent parts of the pipeline,
so we don’t accumulate results (several frames) but we test a single algorithm at a time,
with known input and expected output. This avoid unstability and we can measure the error of the output.
Anton, are you interested in working together on such improved benchmark?
Thanks,
Erwin