Great summary Kristof !
I do not know how frequent is the addition of a new benchmark, but this would disrupt the compile time measurement. On the other hand, we just want to see a (hopefully negative) slope and ignore steps due to new benchmark being added.
Yes, adding or removing benchmarks will result in not comparing like-for-like between different benchmark runs.
My expectation is that we’ll need to find some solution for this, not just for aggregated compile time measurements,
but also for the execution time measurements. We can’t assume the benchmark sources will remain bit-identical
for a long time; as we encourage improving (i.e. changing) the benchmark sources.
Not allowing changes to benchmark sources would result in a slower rate of improving the benchmark sources.