One should be aware of some specific details when benchmarking GNU/Linux systems, as compared to benchmarking on other OS's.
GNU/Linux is a multitasking, multiuser system. So, obviously system load will skew results. On the other hand, this may be exactly the behavior that we want to test: how will a GNU/Linux system perform under heavy use? There is no simple answer to this question. Again, careful data gathering and analysis may reveal interesting opportunities for GNU/Linux improvement.
Note that system load particularly affects latencies, so one should be very careful about the conceptual differences between latency and throughput.
Just a short example, provided by Jeremy Chatfield ( Xi Graphics : some X servers will freeze for various seconds under heavy load, resulting in mouse movements that can become quite jerky. This behavior is totally undesirable, and yet it is not measured by any X server benchmarking tool available.
Also, in a multiuser, multitasking system, the time function reports various items that must be analyzed separately: CPU time vs system time vs elapsed time (mentionned in article II). So, for a CPU benchmark, we should use CPU time, since the time spent during I/O or system functions is irrelevant. In the case of a system benchmark, it is probable that we will try to write a benchmark that spends most of its time in the kernel, so we will use system time. On the other hand, for our Linux kernel 2.0.0 compilation benchmark, we used elapsed time. There is no general rule to be followed here, one must use one's good sense.
Some caveats also apply to NFS benchmarks. The present Linux NFS implementation runs in user space, not in kernel space as in BSD Unices. Similarly, comparing the performance of Linux as a router against dedicated hardware would be an example of comparing Apples and Oranges. Even though Linux networking performance is very good, particularly with some DMA bus mastering Ethernet controllers/drivers, it cannot possibly be compared to dedicated routing hardware.