FYI: Changing RunSafely.sh to only track user time

I am going to change the LLVM test-suite RunSafely.sh to only track
'user' time, instead of reporting 'user' + 'sys' time as it currently
does. This will probably cause a spike in nightly test numbers,
although hopefully it will be limited to the smaller tests.

The eventual goal is to report all numbers (independently), so that we
can control for noise better. However, until that happens it is better
to track the most stable & interesting number.

- Daniel

Daniel Dunbar wrote:

I am going to change the LLVM test-suite RunSafely.sh to only track
'user' time, instead of reporting 'user' + 'sys' time as it currently
does. This will probably cause a spike in nightly test numbers,
although hopefully it will be limited to the smaller tests.

The eventual goal is to report all numbers (independently), so that we
can control for noise better. However, until that happens it is better
to track the most stable & interesting number.
  
First, you should be aware that the test-suite infrastructure is used by the Automatic Pool Allocation and SAFECode projects for their testing infrastructure. It is also used by our internal research projects and may be used by other research projects at other universities. We use this infrastructure for our research, so changes you make can affect us.

Second, why are you only interested in user time? The reason why we had RunSafely.sh measure user + system time is that it gives a more accurate depiction of how well an optimization works. If a program spends most of its time in the OS, increasing speed in user-space doesn't gain us much. If a transform decreases user time but increases system time, then measuring only user time may show a speedup when measuring user+system will show a loss.

If noise is your concern, I think it would make sense to run the tests several times and report averages and standard deviations. System time should only be reporting OS CPU time, so delays due to interrupts, I/O, etc. should not affect the results. User+System reported for several runs should provide the most accurate evaluation of how well an optimization is improving performance.

Third, IIRC, I think you can change the nightly testers to report just user time by changing how they grab the time measurements in TEST.nightly.Makefile. I think the .time files created by the test suite record user, system, and user+system, and the testing Makefiles just grab the number that they want.

Fourth, if you are intent on changing RunSafely.sh to measure just user time, why not add a feature that toggles whether to measure user+system or user time and *then* change the default behavior to measure user time? That way, I can easily toggle it back. What you've suggested above (more or less) is to remove a feature, add a feature, and then add back the feature you removed. It makes more sense to me to add the one feature you want and then to change the default behavior to use the new feature instead of the old one.

-- John T.

What kind of optimization might change the system time?

The problem with measuring system time is that it can depend on many variable factors that have nothing to do with the process being tested. For instance, the number of files on disk, the amount of free space on disk, the total number of processes on the system, the amount of free memory pages on the system, and the size of the buffer cache can all affect how much work a system call has to do.

/jakob

Jakob Stoklund Olesen wrote:

Second, why are you only interested in user time? The reason why we had RunSafely.sh measure user + system time is that it gives a more accurate depiction of how well an optimization works. If a program spends most of its time in the OS, increasing speed in user-space doesn't gain us much. If a transform decreases user time but increases system time, then measuring only user time may show a speedup when measuring user+system will show a loss.
    
What kind of optimization might change the system time?
  
Inlining can (increased code size may affect demand paging). Libcall optimizations can (they may change the amount of work done in userspace vs. kernel space). Automatic pool allocation can (it may change frequency of calls into the OS for memory allocation as well as paging and cache behavior). Anything that changes cache behavior can (because you can kick out OS data and code).

Other transforms are not optimizations, and understanding how they add overhead to both user and kernel time is important. SAFECode can increase calls to mmap()/mremap() when dangling pointer detection is enabled. Dynamic slicing increases time in the OS for trace file creation and trace file consultation; measuring solely user time may give a very inaccurate depiction of its execution time.

There's also the fact that some of the experiments I run compare compilation techniques to binary translation techniques. For example, I've compared Valgrind to SAFECode; it wouldn't surprise me if each one triggers very different behaviors in the kernel. I use the test-suite infrastructure to run these.

The problem with measuring system time is that it can depend on many variable factors that have nothing to do with the process being tested. For instance, the number of files on disk, the amount of free space on disk, the total number of processes on the system, the amount of free memory pages on the system, and the size of the buffer cache can all affect how much work a system call has to do.
  
That's a good point, although I think that most of the examples above are constants on a given system (more or less). Going further, I'm guessing that the real reason some people want the change is so that they can compare performance numbers across *different* environments (e.g., for comparisons against GCC). Is this correct?

That seems like a reasonable thing to want, but it doesn't change the fact that I and others need to measure user+system time because we're doing transforms that can change system time (or, at the very least, we have to prove that our transforms don't change system time appreciably).

Getting back to the original issue at hand, if Daniel wants to track user time only in the nightly tester experiments, I think he can do that by changing the nightly tester Makefiles (again, I think the *.time files generated by RunSafely.sh include all the relevant data). I think it's just a matter of grep'ing the correct value out of the .time file. If that doesn't work, he can enhance RunSafely to record just user time; just as long as there's a command line option I can use to measure user+system, I'm happy.

-- John T.

You or other people are welcome to genericize this by adding a knob to control it. However, mainline llvm should be most concerned with itself, not with projects with out of tree work. This is very similar to an API change.

I don't really expect it, but worse case, if the harness becomes completely unsuitable for you guys, you could just clone all the makefiles in your tree as a 'report'. This would be insulate you from changes.

-Chris