Hi David,
Thanks for your comments.
You talk about killing off processes and the consequences. But you are missing the
point: The build process should never drive a system to the point it needs to kill
off processes not associated with the build. Build processes are not mission critical
real-time processes. And monitoring memory provides a simple way to realize the goal
here. And mmapped memory is not an issue at all, if you are using a set of policies
that prevent the system from getting to the critical point of an out-of-memory failure.
Oh sure, the kernel may decide to reclaim such memory more aggressively than other
allocations, but it will do so without loss of data. This is true, as long as the
system is not exhausted of memory.
Some simple observations:
1.) If I don't have enough memory on a system, then I would hope the build process
would self terminate with a log to inform me of the memory shortage.
2.) If there is sufficient memory to proceed slowly, then I would hope the build
process would inform me of the limited memory available in the logs. And then I
hope the system would run in a slow mode. If things deteriorate with an increasing
risk of an out-of-memory crash, then I would hope the build process would, as a
last resort, self terminate rather than crash the computer. No processes should
ever be killed except build processes.
3.) If there are sufficient resources, then I would hope the build process will
make every effort to use them to complete the build as quickly as possible. If
the system memory deteriorates during the build, then I would hope the build
process could slow down to mitigate the resource limit and possibly kill some
of the build processes if needed. In no case, should the build system force the
kernel to kill off other processes.
4.) What if the computer(s) are running other jobs, and these other jobs need
all the resources while a build is in progress. Well, the sensible policy is
for the build process to yield the resources, terminating the build entirely
as a last resort. Why? Because a build process is never real-time critical.
Other processes may have real-time constraints. The operator of the system(s)
has made an error. The build process should yield the resources as a courtesy,
writing logs to inform the operator of the circumstances. And if the other
processes require resources faster than the build process can respond, then
the whole system may crash -- but it won't be due to the build process.
All that is needed is some knowledge about how much memory these systems require
to do the work they are asked to do. Surely those numbers are known. And, if they
are not, then it shouldn't be too hard to acquire them. And then the build process
simply needs to enforce a reasonable policy on memory requirements. The system
memory stats at /proc/meminfo provide the following metrics that should be
sufficient to implement such a system:
MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the
kernel binary code)
MemFree: The sum of LowFree+HighFree
SwapTotal: total amount of swap space available
SwapFree: Memory which has been evicted from RAM, and is temporarily on the disk
SwapCached: Memory that once was swapped out, is swapped back in but still also
is in the swapfile (if memory is needed it doesn't need to be swapped out AGAIN
because it is already in the swapfile. This saves I/O)
Active: Memory that has been used more recently and usually not reclaimed unless
absolutely necessary.
Inactive: Memory which has been less recently used. It is more eligible to be
reclaimed for other purposes
I believe those few metrics can provide all the information the build process
needs to implement a sane policy.
Thanks for your comments. Enjoyed reading them.
Karen