How many cores do I need to use when I want to benchmark the performance of my compiler?

I write the compiler optimization algorithm which can reduce the instruction of target dependent language.

I put it in the LLVM O3, and it did reduce the assembly code.
And then, I want to compare my new O3 with the original O3. So I using both of them to create two executable files.
I’m sure that the executable files are different.

How many cores do I need to use for benchmark?

When I used one single core to measure the run time of them, the time was similarly same.

But when I didn’t limit the number of cores and measured the run time, the executable from my compiler is faster than the original LLVM O3.

How can I determine which compiler is better?