Fair Benchmarks

Fair Benchmarks

It feels already strange that I often travelled with two laptops, one running Windows XP and one running Mac OS X. Thanks to Apple’s switch to the Intel platform and their support to run Windows XP, this “double load” is no longer necessary. At first it felt strange to use a stylish Apple MacBook Pro as my main Windows laptop, but after a few weeks I do not want to miss anymore running both operating systems on the same machine. This laptop is so powerful that I use it now as my main computer for work and fun. At home I connect it to a 23-inch Cinema HD display on which both Mac OS X and Windows XP look great. Besides a few little issues with the keyboard on the road (at home I use a wireless external keyboard), I can safely recommend this machine even for those who are looking for a pure Windows laptop. The mentioned keyboard issues can be easily solved with a little tool from Microsoft called “remapkey.exe” (part of the “Microsoft Windows Resource Kit Tools”), which I used to enable the “Delete”, “PgUp” and “PgDn” keys, which are very important for writing code and other documents. The keyboard driver installed with the “Macintosh Drivers CD“ does not yet enable these keys, which will be probably fixed with the final release of Apple’s “Boot Camp”.

With Windows and Mac OS running natively on the same hardware, it is now possible to make fair comparisons of the two operating systems with respect to BrainVoyager QX performance. Since one can also install Linux on the MacBook Pro, all three BV operating systems can be compared on the same hardware. I have not yet installed Linux on the laptop and will focus here on initial Windows vs Mac comparisons. One should note that the old G4 Apple laptops were much slower than Windows laptops. With the G5, Apple reached good performance on desktop systems, but this processor seems to be not suited for integration in a 1-inch thick laptop - which is likely the main reason why Apple switched to Intel chips.

The detailed specs of my MacBook Pro laptop are:

  • 2.16 GHz Intel Core Duo processor
  • 2 GB memory (DDR2 SDRAM)
  • 2MB L2 Cache
  • 667 MHz Bus Speed
  • 7200 RPM Hard Disk
  • ATI Radeon X1600 Graphics (256 MB VRAM)
Tests were run with BrainVoyager QX 1.7.8. It is important to realize that many processing routines do not depend only on the processor but also on the speed of the hard disk. The performance of multi-subject GLM analyses, for example, depends more on hard disk than processor performance. Since I want to focus here on raw processing power, benchmarks have been performed which do not require reading of data during processing, or those for which a direct comparison of file vs memory access is possible.
In the first benchmark, a single-run VMR-VTC GLM (“Objects” example data) was executed, once with file access and once with memory access. Note that it is important to run the file access version after a reboot since the accessed data might be cached by the operating system if used before with the result that a second run with file access might actually retrieves the data from memory.
In the second benchmark, a SRF-MTC GLM was executed, again once with file access and once with memory access. The used “CG_LHRH_WM.srf” mesh file was created by merging the reconstructed left and right hemisphere as obtained from the automatic segmentation of the “CG2_3DT1FL_SINC4_TAL.vmr” data set from the “Objects” example. Furthermore, two MTCs were created from the VTC used in the first benchmark and also merged into a single whole-cortex MTC file. A snapshot of this benchmark - half from Mac OS X, half from Windows screen is shown above.
The final benchmark is the most interesting one because it compares performance when using one vs two processor cores. All tests so far use “one thread”, which means that the overall processor load reaches only 50 percent. In BV QX 1.7, the “sigma” contrast enhancement filter is multi-threaded. This filter is time consuming and used during standard automatic segmentation as well as during advanced segmentation (“Enhance” button). For the benchmark, the “Enhance” function in the “Advanced Segmentation Tools” was used with 3 cycles using a “big” 0.5 mm data set (interpolated VMR from the “Objects” example).

Fair Benchmarks - 2

Per default, BVQX uses as many threads as there are processors (or cores) detected on the system for multi-threaded functions. This default behabior is enabled with the “Same as number of processors” option in the “Multi-threading” tab of the “Global Options and Preferences” dialog. The “NrOfThreads:” spin box, thus, contains the number “2” on a dual-core machine. In order to run the “Enhance” benchmark with only one processor core, the option “Set number of threads” has to be selected. This then allows to set the “NrOfThreads” spin box to “1” (or by turning off one processor using operating system tools).

Here are the results of the performed benchmarks:

fairbenchmarks_3

Overall, the laptop performed very well on both platforms with very similar results on all GLM computations. The largest difference is observed in the “Enhance” benchmark with Mac OS X substantially outperforming Windows. Even more surprising is the observation that the tests with 2 cores are more than twice as fast than the corresponding ones with 1 core on either platform. A double performance should be actually reached at the theoretical maximum of full parallelization without losses due to overhead. This maximum can be achieved with imaging data by partitioning a volume (or mesh) into subvolumes (sub-lists of vertices) for “voxel-wise” processing. But here we observe a larger performance gain than theoretically expected...

fairbenchmarks_4
This figure shows the duration of the “Enhance” test with 1 thread on Mac OS X


Pasted Graphic 35
This figure shows the duration of the “Enhance” test with 2 threads on Mac OS X


The “Enhance” results are really surprising. The only explanation I can offer at this point is the fact that the Core Duo processor uses dynamic clock speed adjustments, which might be a) better used on Mac OS X and b) better used if the system has to work hard as compared when it works at a 50% level. In any case, I am now highly motivated to rewrite computation-intensive code parts to use multi-threading, which will boost BV QX performance substantially on multi-processor, multi-core systems.